WO2018177134A1 - Method for processing user-generated content, storage medium and terminal - Google Patents

Method for processing user-generated content, storage medium and terminal Download PDF

Info

Publication number
WO2018177134A1
WO2018177134A1 PCT/CN2018/079228 CN2018079228W WO2018177134A1 WO 2018177134 A1 WO2018177134 A1 WO 2018177134A1 CN 2018079228 W CN2018079228 W CN 2018079228W WO 2018177134 A1 WO2018177134 A1 WO 2018177134A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
image frame
terminal
feature
generated content
Prior art date
Application number
PCT/CN2018/079228
Other languages
French (fr)
Chinese (zh)
Inventor
杨田从雨
陈宇
张�浩
华有为
薛丰
肖鸿志
冯绪
吴昊
张振伟
欧义挺
董晓龙
戚广全
谢俊驰
谢斯豪
梁雪
段韧
张新磊
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201710199078.4A external-priority patent/CN107168619B/en
Priority claimed from CN201710282661.1A external-priority patent/CN108334806B/en
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2018177134A1 publication Critical patent/WO2018177134A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism

Definitions

  • the terminal can convert the selected image frame into a grayscale image, detect the edge in the grayscale image, thereby determining the grayscale change rate at the edge, and determining the sharpness according to the grayscale change rate.
  • access rights can be set when user-generated content is created. For example, if the content creator sets the access right that is only visible to the friend when creating the user-generated content, when the uploaded user account has a friend relationship with the creator's user account, the uploaded user account has the user-generated content. access permission. If the content creator sets access rights visible to everyone when creating user-generated content, any legitimate user account has access to the user-generated content.
  • the terminal uploads the selected image frame to the server, and the server queries the template image that matches the selected image frame.
  • the server queries the matching template image
  • the server feeds back the first notification to the terminal;
  • the server does not query the matching template image
  • the server registers the uploaded image frame as a template image, and feeds back the second notification to the terminal.
  • the terminal displays the content creation entry.
  • the corners of the mouth will rise. If the expression data extracted by the terminal including the facial feature data in the image frame is raised in the corner of the mouth, it can indicate that the emotional feature reflected by the face in the image frame is Happy. When people feel surprised, the mouth opens a large amount. If the terminal extracts the feature data extracted from the face feature data in the image frame, the mouth opening amplitude is large, and the face in the image frame can be represented. The emotional characteristics reflected are astonished.
  • the JPEG format refers to an image format compressed according to the international image compression standard.
  • the direction conforming to the emotional feature recognition condition may specifically be a direction when the angle between the central axis of the face image and the vertical direction in the image frame is not more than 45 degrees.
  • the voice emotional feature recognition result is “happy”.
  • the text obtained by the terminal to recognize the voice data is “I am very happy today”, including the emotional feature keyword “happy”, and the emotional feature mapped to “happy” is “happy”, then the voice emotional feature recognition result is “happy”.
  • the text obtained by the terminal recognizing the voice data is "I am very happy”, including the emotional feature keyword “happy”, and the emotional feature mapped to "happy” is "happy”, then the speech emotional feature recognition result is also "happy”.
  • the acoustic features include timbre and prosodic features.
  • the timbre refers to the sound of the sounding body. Different sounding bodies have different sounds due to different materials and structures.
  • the tone is characterized physics by spectral parameters.
  • the prosodic feature refers to the basic pitch and rhythm of the sound emitted by the sound body, and the rhythm feature is characterized by the fundamental frequency parameter, the duration distribution and the signal intensity.
  • the emotional feature type refers to the type of emotional features reflected by the face. Such as “happy”, “sad” or “anger”.
  • the confidence of the recognition result indicates that the facial emotion feature recognition result is the credibility of the real emotional feature of the face, and the higher the confidence of the recognition result, the higher the possibility that the face emotion feature recognition result is the real emotional feature of the face.
  • the emotional feature image library established in advance by the terminal may include a plurality of emotional feature image sets, and each of the emotional feature image sets reflects an emotional feature type.
  • the terminal may map an emotional feature image one by one corresponding to the intensity of the emotion.
  • the terminal searches for the emotional feature image set that is reflected in the emotional feature image database and the emotional feature type included in the speech emotional feature recognition result, and selects from the found emotional feature image set.
  • the speech emotion feature recognition result includes an emotional feature image corresponding to the emotional intensity.
  • step S1308 determining whether the facial emotion feature recognition result matches the voice emotion feature recognition result; if yes, the process goes to step S1309; if not, the process goes to step S1310.
  • S1316 Render user generated content in the played image frame according to the placement position.
  • the recognition result obtaining module 1703 is further configured to adjust the size of the image frame to a preset size; rotate the direction of the adjusted image frame to a direction that conforms to the emotional feature recognition condition; and send the rotated image frame to the server. Receiving a face emotion feature recognition result returned by the server for the transmitted image frame.
  • RAM is available in a variety of formats, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronization chain.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • Synchlink DRAM SLDRAM
  • Memory Bus Radbus
  • RDRAM Direct RAM
  • DRAM Direct Memory Bus Dynamic RAM
  • RDRAM Memory Bus Dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

A method for processing user-generated content, comprising: collecting image frames from the real world; playing the collected image frames frame by frame according to the time sequence of collection; selecting an image frame from the collected image frames; acquiring user-generated content associated with a template image matching the selected image frame; acquiring a presentation position of the user-generated content in the matched template image; and according to the presentation position, rendering the user-generated content in the played image frame.

Description

用户生成内容处理方法、存储介质和终端User generated content processing method, storage medium, and terminal
本申请要求于2017年03月29日提交中国专利局,申请号为201710199078.4,申请名称为“用户生成内容处理方法和装置”的中国专利申请的优先权,及于2017年04月26日提交中国专利局,申请号为201710282661.1,申请名称为“图像处理方法、装置和电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application is required to be submitted to the China Patent Office on March 29, 2017, the application number is 201710199078.4, and the priority of the Chinese patent application entitled "User-generated content processing method and device" is submitted to China on April 26, 2017. Patent Application No. 201710282661.1, the entire disclosure of which is incorporated herein by reference in its entirety in its entirety in its entirety in the the the the the the the the the the
技术领域Technical field
本申请涉及计算机技术领域,特别是涉及一种用户生成内容处理方法、存储介质和终端。The present application relates to the field of computer technologies, and in particular, to a user generated content processing method, a storage medium, and a terminal.
背景技术Background technique
社交应用是目前广泛应用的一种应用程序。基于社交应用,用户之间能够建立基于社交网络的社交关系,从而基于社交关系进行交互,如发送即时消息、语音通话、视频通话以及在线会议等,为人们的生活和工作提供了极大便利。目前,社交应用能够展示用户生成内容(UGC,User Generated Content)。Social applications are an application that is widely used today. Based on social applications, social network-based social relationships can be established between users, thereby interacting based on social relationships, such as sending instant messages, voice calls, video calls, and online meetings, which greatly facilitates people's lives and work. Currently, social applications are able to display User Generated Content (UGC).
目前,用户之间在建立社交关系后,才能够找到对方的个人主页,或者彼此出现在对方的好友共享页面,彼此的用户生成内容将展示在个人主页或者好友共享页面,因此目前用户生成内容的展示须依赖于社交关系,限制了用户生成内容的传播。At present, after establishing a social relationship between users, the user can find the personal homepage of the other party, or appear on the other party's friend sharing page, and the user generated content of each other will be displayed on the personal homepage or the friend sharing page, so the user currently generates the content. Display must rely on social relationships and limit the spread of user-generated content.
发明内容Summary of the invention
根据本申请提供的各种实施例,提供一种用户生成内容处理方法、存储介质和终端。According to various embodiments provided herein, a user generated content processing method, a storage medium, and a terminal are provided.
一种用户生成内容处理方法,包括:A user generated content processing method, including:
终端从现实世界采集图像帧;The terminal collects image frames from the real world;
所述终端将采集的图像帧按照采集的时序逐帧播放;The terminal plays the captured image frames frame by frame according to the collected timing;
所述终端从采集的图像帧中选取图像帧;The terminal selects an image frame from the collected image frames;
所述终端获取与选取的图像帧匹配的模板图像所关联的用户生成内容;The terminal acquires user generated content associated with a template image that matches the selected image frame;
所述终端获取所述用户生成内容在所述匹配的模板图像中的展示位置;及Obtaining, by the terminal, a display position of the user generated content in the matched template image; and
所述终端按照所述展示位置,在播放的图像帧中渲染所述用户生成内容。The terminal renders the user generated content in a played image frame according to the display position.
一个或多个存储有计算机可执行指令的非易失性计算机可读存储介质,所述计算机可执行指令被一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:One or more non-transitory computer readable storage media storing computer executable instructions, when executed by one or more processors, cause the one or more processors to perform the following steps:
从现实世界采集图像帧;Acquire image frames from the real world;
将采集的图像帧按照采集的时序逐帧播放;The captured image frames are played frame by frame according to the collected timing;
从采集的图像帧中选取图像帧;Selecting an image frame from the acquired image frames;
获取与选取的图像帧匹配的模板图像所关联的用户生成内容;Acquiring user generated content associated with the template image that matches the selected image frame;
获取所述用户生成内容在所述匹配的模板图像中的展示位置;及Obtaining a display position of the user generated content in the matched template image; and
按照所述展示位置,在播放的图像帧中渲染所述用户生成内容。The user generated content is rendered in the played image frame in accordance with the placement.
一种终端,包括存储器和处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行以下步骤:A terminal comprising a memory and a processor, the memory storing computer readable instructions, the computer readable instructions being executed by the processor such that the processor performs the following steps:
从现实世界采集图像帧;Acquire image frames from the real world;
将采集的图像帧按照采集的时序逐帧播放;The captured image frames are played frame by frame according to the collected timing;
从采集的图像帧中选取图像帧;Selecting an image frame from the acquired image frames;
获取与选取的图像帧匹配的模板图像所关联的用户生成内容;Acquiring user generated content associated with the template image that matches the selected image frame;
获取所述用户生成内容在所述匹配的模板图像中的展示位置;及Obtaining a display position of the user generated content in the matched template image; and
按照所述展示位置,在播放的图像帧中渲染所述用户生成内容。The user generated content is rendered in the played image frame in accordance with the placement.
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征、目的和优点将从说明书、附图以及权利要求书变得明显。Details of one or more embodiments of the present application are set forth in the accompanying drawings and description below. Other features, objects, and advantages of the invention will be apparent from the description and appended claims.
附图说明DRAWINGS
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the present application. Other drawings may also be obtained from those of ordinary skill in the art in light of the inventive work.
图1为一个实施例中用户生成内容处理方法的应用环境图;1 is an application environment diagram of a user generated content processing method in an embodiment;
图2为一个实施例中终端的内部结构示意图;2 is a schematic diagram showing the internal structure of a terminal in an embodiment;
图3为一个实施例中用户生成内容处理方法的流程示意图;3 is a schematic flowchart of a user generated content processing method in an embodiment;
图4为一个具体应用场景中用户生成内容处理方法的流程示意图;4 is a schematic flowchart of a user-generated content processing method in a specific application scenario;
图5为一个实施例中社交应用的主页面的示意图;5 is a schematic diagram of a main page of a social application in one embodiment;
图6为一个实施例中在主页面中展示工具菜单的示意图;6 is a schematic diagram showing a tool menu in a main page in one embodiment;
图7为一个实施例中通过功能入口进入的虚拟世界页面和现实世界物体的对比图;Figure 7 is a comparison diagram of a virtual world page and a real world object entered through a function entry in one embodiment;
图8为一个实施例中展示内容创建者头像列表的虚拟世界页面和现实世界物体的对比图;8 is a comparison diagram of a virtual world page and a real world object showing a list of content creator avatars in one embodiment;
图9为一个实施例中具有评论页面的虚拟世界页面和现实世界物体的对比图;9 is a comparison diagram of a virtual world page having a comment page and a real world object in one embodiment;
图10为一个实施例中具有内容创建入口的虚拟世界页面和现实世界物体的对比图;10 is a comparison diagram of a virtual world page having a content creation portal and a real world object in one embodiment;
图11为一个实施例中具有图片编辑页面的虚拟世界页面和现实世界物体的对比图;11 is a comparison diagram of a virtual world page with a photo editing page and a real world object in one embodiment;
图12为另一个实施例中用户生成内容处理方法的流程示意图;12 is a schematic flow chart of a user generated content processing method in another embodiment;
图13为另一个实施例中用户生成内容处理方法的流程示意图;13 is a schematic flow chart of a user generated content processing method in another embodiment;
图14为一个实施例中绘制情感特征图像前后界面的对比示意图;14 is a schematic diagram of comparison of front and rear interfaces of an emotional feature image in one embodiment;
图15为一个实施例中显示根据语音数据识别得到的文本前后界面的对比示意图;FIG. 15 is a schematic diagram showing a comparison of front and rear interfaces of texts recognized according to voice data in an embodiment; FIG.
图16为一个实施例中终端的结构框图;Figure 16 is a block diagram showing the structure of a terminal in an embodiment;
图17为另一个实施例中终端的结构框图;及17 is a structural block diagram of a terminal in another embodiment; and
图18为另一个实施例中终端的结构框图。Figure 18 is a block diagram showing the structure of a terminal in another embodiment.
具体实施方式detailed description
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the objects, technical solutions, and advantages of the present application more comprehensible, the present application will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the application and are not intended to be limiting.
图1为一个实施例中用户生成内容处理方法的应用环境图。参照图1,该应用环境包括终端110和服务器120,终端110能够通过网络与服务器120连接通信。其中,终端110可用于从现实世界采集图像帧;将采集的图像帧按照采集的时序逐帧播放;从采集的图像帧中选取图像帧;从服务器120拉取与选取的图像帧匹配的模板图像所关联的用户生成内容,以及用户生成内容在匹配的模板图像中的展示位置;按照展示位置,在播放的图像帧中渲染用户生成内容。服务器120可用于存储模板图像、用户生成内容以及用户生成内容在匹配的模板图像中的展示位置之间的对应关系。FIG. 1 is an application environment diagram of a user generated content processing method in an embodiment. Referring to FIG. 1, the application environment includes a terminal 110 and a server 120, and the terminal 110 can communicate with the server 120 via a network. The terminal 110 can be used to collect image frames from the real world; the captured image frames are played frame by frame according to the collected timing; the image frames are selected from the collected image frames; and the template image matched with the selected image frames is pulled from the server 120. The associated user generated content, as well as the placement of the user generated content in the matched template image; the user generated content is rendered in the played image frame in accordance with the placement. The server 120 can be configured to store a correspondence between template images, user generated content, and placements of user generated content in matching template images.
图2为一个实施例中终端的内部结构示意图。该终端具体可以是如图1所示的终端110。参照图2,该终端包括通过系统总线连接的处理器、存储器、网络接口、显示屏、摄像头和输入装置。其中,存储器包括非易失性存储介质和内存储器。终端的非易失性存储介质存储有操作系统和计算机可读指令,该计算机可读指令被执行时,可使得处理器执行一种用户生成内容处理方法。终端的处理器用于提供计算和控制能力,支撑整个终端的运行。终端中的内存储器中可存储有计算机可读指令,该计算机可读指令被处理器执行时,可使得处理器执行一种用户生成内容处理方法。终端的网络接口用于与服务器120进行网络通信,如上传图像帧、上传创建的用户生成内容或者拉取用户生成内容等。终端的摄像头用于采集图像帧。终端的显示屏可以是液晶显示屏或者电子墨水显示屏,终端的输入装置可以是显示屏上覆盖的触摸层,也 可以是终端外壳上设置的按键、轨迹球或触控板,还可以是外接的键盘、触控板或鼠标等。该终端包括固定终端和移动终端,移动终端包括手机、平板电脑、个人数字助理和穿戴式设备等中的一种或几种的组合。本领域技术人员可以理解,图2中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的终端的限定,具体的终端可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。FIG. 2 is a schematic diagram showing the internal structure of a terminal in an embodiment. The terminal may specifically be the terminal 110 as shown in FIG. 1. Referring to FIG. 2, the terminal includes a processor, a memory, a network interface, a display screen, a camera, and an input device connected through a system bus. Wherein, the memory comprises a non-volatile storage medium and an internal memory. The non-volatile storage medium of the terminal stores an operating system and computer readable instructions that, when executed, cause the processor to perform a user generated content processing method. The processor of the terminal is used to provide computing and control capabilities to support the operation of the entire terminal. Computer readable instructions may be stored in the internal memory in the terminal, the computer readable instructions being executable by the processor to cause the processor to perform a user generated content processing method. The network interface of the terminal is used for network communication with the server 120, such as uploading an image frame, uploading the created user generated content, or pulling the user generated content. The camera of the terminal is used to acquire image frames. The display screen of the terminal may be a liquid crystal display or an electronic ink display screen, and the input device of the terminal may be a touch layer covered on the display screen, or a button, a trackball or a touchpad provided on the terminal housing, or may be an external connection. Keyboard, trackpad or mouse. The terminal includes a fixed terminal and a mobile terminal, and the mobile terminal includes one or a combination of a mobile phone, a tablet computer, a personal digital assistant, and a wearable device. It will be understood by those skilled in the art that the structure shown in FIG. 2 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the terminal to which the solution of the present application is applied. The specific terminal may include a ratio. More or fewer components are shown in the figures, or some components are combined, or have different component arrangements.
如图3所示,在一个实施例中,提供了一种用户生成内容处理方法。本实施例主要以该方法应用于上述图1中的终端110来举例说明。参照图3,该用户生成内容处理方法具体包括如下步骤:As shown in FIG. 3, in one embodiment, a user generated content processing method is provided. This embodiment is mainly illustrated by the method being applied to the terminal 110 in FIG. 1 described above. Referring to FIG. 3, the user generated content processing method specifically includes the following steps:
S302,从现实世界采集图像帧。S302. Acquire an image frame from the real world.
其中,现实世界是自然存在的世界,也是人类所处的世界。图像帧是能够形成动态画面的图像帧序列中的单元,用来记录某时刻现实世界中的物体。Among them, the real world is a world in which nature exists and a world in which human beings live. An image frame is a unit in a sequence of image frames capable of forming a dynamic picture for recording objects in the real world at a certain moment.
在一个实施例中,终端具体可按照固定或动态的帧率,从现实世界采集图像帧。其中,固定或动态的帧率能够使图像帧按照该固定或动态的帧率播放时形成连续的动态画面。In one embodiment, the terminal may acquire image frames from the real world, in particular at a fixed or dynamic frame rate. Among them, the fixed or dynamic frame rate enables a continuous dynamic picture to be formed when the image frame is played at the fixed or dynamic frame rate.
在一个实施例中,终端可通过摄像头,在摄像头当前的视野下,采集现实世界的图像帧。其中,摄像头的视野可因终端的姿态和位置的变化而变化。In one embodiment, the terminal can capture real-world image frames in the current field of view of the camera through the camera. Among them, the field of view of the camera may vary due to changes in the posture and position of the terminal.
在一个实施例中,终端可通过社交应用,提供AR(Augmented Reality,增强现实)拍摄模式,并在选定该AR拍摄模式后,从现实世界采集图像帧。其中,社交应用是能够基于社交网络进行网络社交互动的应用。社交应用包括即时通信应用、SNS(Social Network Service,社交网站)应用、直播应用或者拍照应用等。In one embodiment, the terminal may provide an AR (Augmented Reality) shooting mode through a social application, and acquire an image frame from the real world after the AR shooting mode is selected. Among them, the social application is an application capable of social interaction of the network based on the social network. Social applications include instant messaging applications, SNS (Social Network Service) applications, live broadcast applications, or photo applications.
S304,将采集的图像帧按照采集的时序逐帧播放。S304. The captured image frames are played frame by frame according to the collected timing.
其中,采集的时序是指采集图像帧时的时间顺序,可通过图像帧在采集时记录的时间戳的大小关系来表示。逐帧播放是指逐图像帧播放。The timing of the acquisition refers to the time sequence when the image frames are collected, and can be represented by the size relationship of the time stamps recorded by the image frames during the acquisition. Frame-by-frame playback refers to playback from image to frame.
终端具体可按照采集图像帧的帧率,按照时间戳升序,逐个播放采集的 图像帧。终端可以将采集的图像帧直接播放,也可以将采集的图像帧按照采集的时序存入缓存区,并按采集的时序从缓存区取出图像帧播放。The terminal may specifically play the collected image frames one by one according to the frame rate of the collected image frames in ascending order of the time stamp. The terminal can directly play the captured image frame, or store the captured image frame into the buffer area according to the collected timing, and take out the image frame playback from the buffer area according to the collected timing.
S306,从采集的图像帧中选取图像帧。S306. Select an image frame from the collected image frames.
其中,选取的图像帧,可以是采集的图像帧中的关键帧。The selected image frame may be a key frame in the captured image frame.
在一个实施例中,终端可接收用户选择指令,根据该用户选择指令,从采集的图像帧中选取图像帧。In one embodiment, the terminal can receive a user selection instruction to select an image frame from the acquired image frames in accordance with the user selection instruction.
在一个实施例中,终端可在播放的图像帧符合画面稳定条件时,从采集的图像帧中选取当前采集或当前正播放的图像帧。画面稳定条件可以是播放的图像帧在预设时长内的差异在设定的范围内。In one embodiment, the terminal may select an image frame currently being captured or currently being played from the captured image frame when the played image frame conforms to the picture stabilization condition. The picture stabilization condition may be that the difference of the played image frame within the preset time length is within the set range.
S308,获取与选取的图像帧匹配的模板图像所关联的用户生成内容。S308. Acquire user-generated content associated with the template image that matches the selected image frame.
其中,用户生成内容是指由用户产生的内容。用户生成内容可以包括文本、图片、音频或者视频中的至少一种。用户生成内容可以是用户发表的内容,也可以是用户对发表的内容的评论内容,还可以是用户对评论内容的回复内容。Among them, user generated content refers to content generated by the user. The user generated content may include at least one of text, picture, audio, or video. The user-generated content may be the content published by the user, or may be the user's comment content on the published content, or may be the user's reply content to the comment content.
模板图像与用户生成内容关联,用于标记用户生成内容。通过模板图像可以定位到相关联的用户生成内容。一个模板图像可以关联一个或多个用户生成内容。一个模板图像可以关联一个或多个用户所发表的用户生成内容。发表用户生成内容的用户可以称为内容创建者。Template images are associated with user-generated content and are used to tag user-generated content. The associated user generated content can be located through the template image. A template image can be associated with one or more user generated content. A template image can be associated with user-generated content published by one or more users. A user who posts user generated content may be referred to as a content creator.
在一个实施例中,在判断选取的图像帧和模板图像是否匹配,具体可先计算选取的图像帧和模板图像之间的相似度,进而判断该相似度是否大于等于预设相似度;若是,则匹配;若否,则不匹配。In an embodiment, determining whether the selected image frame and the template image match, specifically calculating a similarity between the selected image frame and the template image, and determining whether the similarity is greater than or equal to a preset similarity; if yes, Then match; if not, it does not match.
计算选取的图像帧和模板图像之间的相似度时,可先提取选取的图像帧和模板图像各自的特征,从而计算两特征之间的差异,特征之间的差异越大则相似度越低,特征之间的差异越小则相似度越高。具体可通过经过训练的神经网络模型来提取特征,具体可以提取颜色特征、纹理特征和形状特征中的一种或几种的组合。相似度可采用余弦相似度或者图像间各自感知哈希值的汉明距离。When calculating the similarity between the selected image frame and the template image, the respective features of the selected image frame and the template image may be extracted first, thereby calculating the difference between the two features. The greater the difference between the features, the lower the similarity is. The smaller the difference between the features, the higher the similarity. Specifically, the feature may be extracted through a trained neural network model, and specifically one or a combination of a color feature, a texture feature, and a shape feature may be extracted. The similarity may be a cosine similarity or a Hamming distance in which the respective perceived hash values are between the images.
在一个实施例中,终端可先从本地缓存区查询与选取的图像帧匹配的模板图像,当查询到匹配的模板图像时,再从本地缓存区或服务器拉取该模板图像所关联的用户生成内容。当本地缓存区中未查询到匹配的模板图像时,终端可进一步从服务器查询与选取的图像帧匹配的模板图像,当本地缓存区中查询到匹配的模板图像时,终端可从服务器拉取该模板图像所关联的用户生成内容。终端从服务器查询到匹配的模板图像后,可将匹配的模板图像存储在本地缓存区。In an embodiment, the terminal may first query a template image that matches the selected image frame from the local buffer area, and when the matching template image is queried, pull the user generated by the template image from the local buffer area or the server. content. When the matching template image is not found in the local buffer area, the terminal may further query the template image that matches the selected image frame from the server. When the matching template image is queried in the local buffer area, the terminal may pull the template image from the server. User-generated content associated with the template image. After the terminal queries the matching template image from the server, the matching template image can be stored in the local buffer area.
在一个实施例中,终端可获取用户生成内容,该用户生成内容所关联的模板图像与选取的图像帧匹配,该模板图像对应的地理位置与当前地理位置满足相近条件。相近条件是量化的表示两个地理位置接近的条件,相近条件如地理位置之间的距离小于或等于预设值。本实施例中,结合地理位置,可以进行更加准确的匹配。In an embodiment, the terminal may acquire user generated content, and the template image associated with the user generated content matches the selected image frame, and the geographic location corresponding to the template image satisfies the similar condition with the current geographic location. The similar condition is a quantitative condition indicating that two geographical locations are close, and similar conditions such as a distance between geographical locations are less than or equal to a preset value. In this embodiment, a more accurate matching can be performed in combination with the geographical location.
S310,获取用户生成内容在匹配的模板图像中的展示位置。S310. Acquire a display position of the user-generated content in the matched template image.
其中,用户生成内容在匹配的模板图像中的展示位置,该展示位置表示用户生成内容在模板图像中所占的区域。展示位置可以由用户生成内容在模板图像中所占的区域在模板图像的坐标系中的坐标表示。Wherein, the user generates a display position of the content in the matched template image, and the display position represents an area occupied by the user-generated content in the template image. The placement location may be represented by a coordinate representation of the area occupied by the user in the template image in the coordinate system of the template image.
在一个实施例中,终端可在获取用户生成内容时,一并获取该用户生成内容的展示位置。终端具体可从本地缓存区或者服务器获取该展示位置。In an embodiment, the terminal may obtain the display position of the user-generated content together when acquiring the user-generated content. The terminal can obtain the placement specifically from a local cache or a server.
S312,按照展示位置,在播放的图像帧中渲染用户生成内容。S312. Render user-generated content in the played image frame according to the display position.
具体地,终端可在当前播放的图像帧中,在获取的展示位置处渲染用户生成内容。终端可获取用户生成内容对应的样式数据,从而按照该样式数据和获取的展示位置,在播放的图像帧中渲染用户生成内容。Specifically, the terminal may render the user generated content at the obtained display position in the currently played image frame. The terminal may acquire the style data corresponding to the user-generated content, thereby rendering the user-generated content in the played image frame according to the style data and the obtained display position.
在一个实施例中,展示位置可以是用户生成内容相对于模板图像中物体区域的位置;终端可在播放的图像帧中追踪板图像中的物体区域,从而按照该展示位置和追踪到的物体区域,确定当前播放的图像帧中用户生成内容相对于追踪到的物体区域的位置,从而按照确定的位置渲染用户生成内容。In one embodiment, the presentation location may be a location of the user generated content relative to the object area in the template image; the terminal may track the object area in the panel image in the played image frame, thereby following the display location and the tracked object area And determining a position of the user-generated content in the currently played image frame relative to the tracked object region, thereby rendering the user-generated content according to the determined position.
其中,物体区域是图像中可表示现实世界物体的区域,该物体可以是生 物或者非生物,生物如人体、动物体或者植物体,非生物如建筑物、工业产品或者自然景观。The object area is an area in the image that can represent a real-world object. The object can be a living or a non-living creature, such as a human body, an animal body, or a plant body, such as a building, an industrial product, or a natural landscape.
上述用户生成内容处理方法,从现实世界采集图像帧并按照采集的时序播放,通过从采集的图像帧中选取的图像帧,就能够确定该图像帧所匹配的模板图像所关联的用户生成内容,并进行展示。能够通过现实世界中拍摄的图像帧定位到用户生成内容并展示,可以不必依赖社交关系,扩展了用户生成内容的传播方式。而且,按照用户生成内容在匹配的模板图像中的展示位置,在播放的图像帧中追踪渲染用户生成内容,将虚拟世界中的用户生成内容与播放的视频帧所反映的现实世界融合,提供了用户生成内容的新互动方式。The user-generated content processing method collects image frames from the real world and plays them according to the collected timing. By selecting the image frames from the captured image frames, the user-generated content associated with the template image matched by the image frames can be determined. And show it. The ability to locate and display user-generated content through image frames captured in the real world can extend the way user-generated content is propagated without having to rely on social relationships. Moreover, according to the display position of the user-generated content in the matched template image, the user-generated content is tracked and rendered in the played image frame, and the user-generated content in the virtual world is merged with the real world reflected by the played video frame, and A new way of interacting with user-generated content.
在一个实施例中,该用户生成内容处理方法还包括:判断选取的图像帧的特征是否符合预设的模板图像特征。该步骤具体可在S306之后执行。当选取的图像帧的特征符合模板图像特征时,执行步骤S308;当选取的图像帧的特征不符合模板图像特征时,返回至步骤S306。In one embodiment, the user generated content processing method further comprises: determining whether the feature of the selected image frame conforms to a preset template image feature. This step can be specifically performed after S306. When the feature of the selected image frame conforms to the template image feature, step S308 is performed; when the feature of the selected image frame does not conform to the template image feature, the process returns to step S306.
其中,预设的模板图像特征,是预先设置的作为模板图像的图像应当具备的特征。模板图像可具备良好的区分性,避免不同的模板图像关联的用户生成内容相混淆。The preset template image feature is a feature that the image that is preset as the template image should have. Template images can be well differentiated to avoid confusion with user-generated content associated with different template images.
在一个实施例中,判断选取的图像帧的特征是否符合预设的模板图像特征包括:提取选取的图像帧的特征点,判断提取的特征点的数量是否达到预设的模板图像特征点数量阈值。本实施例中,预设的模板图像特征为特征点的数量达到预设的模板图像特征点数量阈值。In an embodiment, determining whether the feature of the selected image frame conforms to the preset template image feature comprises: extracting a feature point of the selected image frame, and determining whether the number of the extracted feature points reaches a preset threshold value of the template image feature point. . In this embodiment, the preset template image feature is that the number of feature points reaches a preset template image feature point threshold.
其中特征点是选取的图像帧中具有鲜明特性并能够有效反映图像本质特征的点,该特征点具有标识图像帧中的物体的能力。模板图像特征点数量阈值可根据需要设置。模板图像特征点数量阈值越高,能够作为模板图像的图像帧的区分性越好。The feature points are points in the selected image frame that have distinctive characteristics and can effectively reflect the essential features of the image, and the feature points have the ability to identify objects in the image frame. The threshold value of the template image feature points can be set as needed. The higher the threshold value of the template image feature points, the better the distinguishability of the image frames that can be used as the template image.
在一个实施例中,判断选取的图像帧的特征是否符合预设的模板图像特征包括:获取选取的图像帧的分辨率,判断分辨率是否达到预设的模板图像 分辨率阈值。本实施例中,预设的模板图像特征为分辨率达到预设的模板图像分辨率阈值。In an embodiment, determining whether the feature of the selected image frame conforms to the preset template image feature comprises: obtaining a resolution of the selected image frame, and determining whether the resolution reaches a preset template image resolution threshold. In this embodiment, the preset template image feature is that the resolution reaches a preset template image resolution threshold.
其中,选取的图像帧的分辨率,表示图像帧的宽度和高度,预设的模板图像分辨率阈值包括预设的模板图像宽度和模板图像高度。终端具体可获取选取的图像帧的宽度和高度,判断获取的宽度和高度是否均分别达到了预设的模板图像宽度和模板图像高度。The resolution of the selected image frame represents the width and height of the image frame, and the preset template image resolution threshold includes a preset template image width and a template image height. The terminal specifically obtains the width and height of the selected image frame, and determines whether the acquired width and height respectively reach the preset template image width and the template image height.
在一个实施例中,判断选取的图像帧的特征是否符合预设的模板图像特征包括:获取选取的图像帧的清晰度,判断清晰度是否达到预设的模板图像清晰度阈值。本实施例中,预设的模板图像特征为清晰度达到预设的模板图像清晰度阈值。In an embodiment, determining whether the feature of the selected image frame conforms to the preset template image feature comprises: obtaining a sharpness of the selected image frame, and determining whether the sharpness reaches a preset template image sharpness threshold. In this embodiment, the preset template image feature is that the resolution reaches a preset template image sharpness threshold.
其中,清晰度与分辨率不同,是指相应图像帧上各细部影纹及其边界的清晰程度。终端可将选取的图像帧转化为灰度图像,检测灰度图像中的边缘,从而判断边缘处的灰度变化率,根据灰度变化率确定清晰度。边缘处灰度变化越快,表示清晰度越高;边缘处灰度变化越慢,表示清晰度越低。Among them, the definition and resolution are different, which refers to the degree of clarity of each detail and its boundary on the corresponding image frame. The terminal can convert the selected image frame into a grayscale image, detect the edge in the grayscale image, thereby determining the grayscale change rate at the edge, and determining the sharpness according to the grayscale change rate. The faster the gray level changes at the edge, the higher the definition; the slower the gray level change at the edge, the lower the resolution.
在一个实施例中,判断选取的图像帧的特征是否符合预设的模板图像特征包括:获取选取的图像帧中的物体区域占选取的图像帧的占比,判断占比是否达到预设的模板图像物体占比。本实施例中,预设的模板图像特征为物体区域占选取的图像帧的占比达到预设的模板图像物体占比。In an embodiment, determining whether the feature of the selected image frame conforms to the preset template image feature comprises: obtaining an occupied area of the selected image frame in the selected image frame, and determining whether the proportion reaches a preset template. The proportion of image objects. In this embodiment, the preset template image feature is that the proportion of the object area occupied by the selected image frame reaches a preset template image object ratio.
具体地,终端可检测选取的图像帧的边缘,将由检测到的边缘构成的面积达到预设面积的封闭区域作为物体区域,判断物体区域的面积占选取的图像帧的总面积的占比是否达到预设的模板图像物体占比。图像或区域的面积,可由该图像或区域所含像素点的数量表示。Specifically, the terminal can detect the edge of the selected image frame, and use the closed area of the detected area to reach the preset area as the object area, and determine whether the area of the object area occupies the proportion of the total area of the selected image frame. The preset template image object ratio. The area of an image or area can be represented by the number of pixels contained in the image or area.
上述各个实施例中判断选取的图像帧的特征是否符合预设的模板图像特征的条件可自由组合;当组合的条件均满足时,判定符合预设的模板图像特征;当组合的条件至少一个不满足时,判定不符合预设的模板图像特征。In the above embodiments, the conditions for determining whether the feature of the selected image frame conforms to the preset template image feature can be freely combined; when the combined conditions are satisfied, the determination conforms to the preset template image feature; when the combined condition is at least one When satisfied, it is determined that the preset template image feature is not met.
上述实施例中,当选取的图像帧的特征符合模板图像特征时,获取与选取的图像帧匹配的模板图像所关联的用户生成内容,可以直接过滤掉难以匹 配到模板图像的图像帧,提高处理效率。In the above embodiment, when the feature of the selected image frame conforms to the template image feature, the user-generated content associated with the template image that matches the selected image frame is obtained, and the image frame that is difficult to match the template image can be directly filtered out, and the processing is improved. effectiveness.
在一个实施例中,步骤S308包括:将选取的图像帧上传至服务器;接收服务器反馈的表示查询到与上传的图像帧匹配的模板图像的第一通知;根据第一通知,获取与模板图像关联的用户生成内容。In an embodiment, step S308 includes: uploading the selected image frame to the server; receiving, by the server, a first notification indicating that the template image is matched to the uploaded image frame; and acquiring, according to the first notification, acquiring the template image User generated content.
其中,第一通知和下述的第二通知都是通知,第一和第二的描述用于区分不同的通知。通知可以是独立的消息,也可以是混合了多种类型信息的消息。The first notification and the second notification described below are both notifications, and the first and second descriptions are used to distinguish different notifications. The notification can be an independent message or a message that mixes multiple types of information.
具体地,终端将选取的图像帧上传至服务器,由服务器查询与上传的图像帧匹配的模板图像。当查询到该模板图像时,服务器向终端返回第一通知,该第一通知表示查询到与上传的图像帧匹配的模板图像。Specifically, the terminal uploads the selected image frame to the server, and the server queries the template image that matches the uploaded image frame. When the template image is queried, the server returns a first notification to the terminal, the first notification indicating that the template image matching the uploaded image frame is queried.
在一个实施例中,终端可将本地登陆所用的用户账号和选取的图像帧上传至服务器,接收服务器反馈的第一通知,该第一通知表示查询到与上传的图像帧匹配的模板图像,且该模板图像所关联的用户生成内容对上传的用户账号开放了访问权限。终端可进一步根据第一通知,获取与模板图像关联的、对上传的用户账号开放了访问权限的用户生成内容。In an embodiment, the terminal may upload the user account and the selected image frame used for the local login to the server, and receive a first notification fed back by the server, where the first notification indicates that the template image matching the uploaded image frame is queried, and The user-generated content associated with the template image has open access to the uploaded user account. The terminal may further acquire, according to the first notification, user generated content associated with the template image and having access rights to the uploaded user account.
其中,访问权限可在用户生成内容创建时设置。比如,若内容创建者在创建用户生成内容时,设置了仅对好友可见的访问权限,则当上传的用户账号与创建者的用户账号存在好友关系时,上传的用户账号对该用户生成内容具有访问权限。若内容创建者在创建用户生成内容时,设置了对所有人可见的访问权限,则任意合法的用户账号对该用户生成内容具有访问权限。Among them, access rights can be set when user-generated content is created. For example, if the content creator sets the access right that is only visible to the friend when creating the user-generated content, when the uploaded user account has a friend relationship with the creator's user account, the uploaded user account has the user-generated content. access permission. If the content creator sets access rights visible to everyone when creating user-generated content, any legitimate user account has access to the user-generated content.
终端可根据第一通知,获取匹配的模板图像,并将该模板图像缓存到本地的缓存区。终端还可以获取与用户生成内容相关的用户信息,该用户信息比如用户账号、用户头像或者用户昵称等。The terminal may obtain a matching template image according to the first notification, and cache the template image into a local buffer area. The terminal may also obtain user information related to user generated content, such as a user account, a user avatar, or a user nickname.
在一个实施例中,终端可直接从第一通知中获取与模板图像关联的用户生成内容,还可以从第一通知中获取模板图像和/或与用户生成内容相关的用户信息。In an embodiment, the terminal may directly acquire the user-generated content associated with the template image from the first notification, and may also obtain the template image and/or the user information related to the user-generated content from the first notification.
在一个实施例中,终端可从第一通知获取匹配的模板图像的图像编号, 从而向服务器发送携带该图像编号的查询请求,从而接收由服务器查询到并反馈的与该图像编号关联的用户生成内容。终端还可以从服务器查询与该图像编号对应的模板图像和/或用户信息。In an embodiment, the terminal may obtain an image number of the matched template image from the first notification, thereby sending a query request carrying the image number to the server, thereby receiving the user generated by the server and fed back in association with the image number. content. The terminal may also query the template image and/or user information corresponding to the image number from the server.
上述实施例中,服务器实现对上传的图像帧和模板图像的匹配,这样基于服务器,各个用户都可以针对现实世界中相同或相似的场景进行基于用户生成内容的交互,实现了基于现实世界、虚拟世界和社交网络的社交互动。In the above embodiment, the server implements matching of the uploaded image frame and the template image, so that each user can perform user-generated content-based interaction for the same or similar scenes in the real world based on the server, realizing real world-based, virtual Social interaction between the world and social networks.
在一个实施例中,步骤S308包括:将选取的图像帧上传至服务器;接收服务器反馈的表示未查询到与上传的图像帧匹配的模板图像的第二通知;根据第二通知展示内容创建入口;根据对内容创建入口的操作创建用户生成内容;将创建的用户生成内容上传至服务器,使得服务器将上传的用户生成内容与由上传的图像帧注册而成的模板图像关联存储。In an embodiment, step S308 includes: uploading the selected image frame to the server; receiving, by the server, a second notification indicating that the template image that matches the uploaded image frame is not queried; and displaying the content creation entry according to the second notification; The user generated content is created according to the operation of creating an entry for the content; the created user generated content is uploaded to the server, so that the server stores the uploaded user generated content in association with the template image registered by the uploaded image frame.
其中,未查询到与上传的图像帧匹配的模板图像,可以是服务器上不存在与上传的图像帧匹配的模板图像;或者可以是虽然服务器上存在与上传的图像帧匹配的模板图像,但该模板图像对应的用户生成内容均未对触发图像帧上传的用户账号开放访问权限。The template image that matches the uploaded image frame is not queried, and the template image that matches the uploaded image frame does not exist on the server; or the template image that matches the uploaded image frame exists on the server, but the template image None of the user-generated content corresponding to the template image has open access to the user account that triggered the image frame upload.
内容创建入口用于触发创建用户生成内容。内容创建入口可以是可视的能够触发事件的控件,如图标或者按钮。内容创建入口具体可以是触发创建全新用户生成内容的入口。其中全新用户生成内容是指与已有的用户生成内容在内容上相独立的用户生成内容。内容创建入口也可以是触发创建与已有的用户生成内容关联的用户生成内容的入口。其中,与已有的用户生成内容关联的用户生成内容比如评论内容或者对评论内容的回复内容。The content creation portal is used to trigger the creation of user-generated content. The content creation portal can be a visual control that can trigger an event, such as an icon or button. The content creation portal may specifically be an entry that triggers the creation of brand new user generated content. The new user-generated content refers to user-generated content that is independent of the existing user-generated content. The content creation portal can also be an entry that triggers the creation of user-generated content associated with existing user-generated content. Among them, user generated content associated with existing user generated content such as comment content or reply content to the comment content.
具体地,终端将选取的图像帧上传至服务器,由服务器查询与选取的图像帧匹配的模板图像。服务器在查询到匹配的模板图像时,向终端反馈第一通知;服务器在未查询到匹配的模板图像时,将上传的图像帧注册为模板图像,并向终端反馈第二通知。终端接收到第二通知后,展示内容创建入口。Specifically, the terminal uploads the selected image frame to the server, and the server queries the template image that matches the selected image frame. When the server queries the matching template image, the server feeds back the first notification to the terminal; when the server does not query the matching template image, the server registers the uploaded image frame as a template image, and feeds back the second notification to the terminal. After receiving the second notification, the terminal displays the content creation entry.
进一步地,终端检测对内容创建入口的操作,根据检测到的操作获取用户输入的内容,从而创建用户生成内容,将该用户生成内容上传至服务器, 由服务器将上传的用户生成内容与由上传的图像帧注册而成的模板图像关联并存储。服务器若在将上传的图像帧注册为模板图像起的预设时长内未接收到创建的用户生成内容,或者接收到终端反馈的取消注册请求,则将对上传的图像帧的注册取消。Further, the terminal detects an operation of creating an entry for the content, acquires content input by the user according to the detected operation, thereby creating user generated content, uploading the user generated content to the server, and uploading the generated user generated content by the server The template image registered with the image frame is associated and stored. If the server does not receive the created user-generated content within the preset duration from the registration of the uploaded image frame as the template image, or receives the cancellation registration request fed back by the terminal, the registration of the uploaded image frame is canceled.
上述实施例中,针对现实世界的某一场景,当还未存在相关联的用户生成内容时,可以创建与该场景关联的用户生成内容,下次就能够以本次上传的图像帧作为模板图像进行匹配,不断丰富用户生成内容,为用户提供基于现实世界和虚拟世界的更加便利的互动方式。In the foregoing embodiment, for a certain scene in the real world, when the associated user generated content does not exist, the user generated content associated with the scene may be created, and the image frame uploaded this time can be used as the template image. Matching, continually enriching user-generated content, and providing users with a more convenient way to interact based on the real world and the virtual world.
在一个实施例中,该用户生成内容处理方法还包括:获取在创建用户生成内容时配置的立体旋转参数。该步骤具体可在S312之前执行。步骤S312包括:按照展示位置,在播放的图像帧中渲染按照立体旋转参数旋转后的用户生成内容。In one embodiment, the user generated content processing method further includes: acquiring a stereo rotation parameter configured when the user generated content is created. This step can be performed specifically before S312. Step S312 includes: rendering the user-generated content rotated according to the stereo rotation parameter in the played image frame according to the display position.
其中,立体旋转参数是指将用户生成内容在虚拟世界的立体坐标系中旋转的参数。立体旋转参数比如水平旋转角度和/或垂直旋转角度。其中水平旋转角度是指将用户生成内容沿着虚拟世界的立体坐标系中的水平面进行旋转时所旋转的角度。垂直旋转角度则是用户生成内容沿着虚拟世界的立体坐标系中的垂直面进行旋转时所旋转的角度。立体旋转参数可在用户生成内容被创建时一并配置,并对应于该用户生成内容存储。The stereo rotation parameter refers to a parameter that rotates user generated content in a virtual coordinate system of the virtual world. Stereo rotation parameters such as a horizontal rotation angle and/or a vertical rotation angle. The horizontal rotation angle refers to an angle rotated when the user generated content is rotated along a horizontal plane in a virtual coordinate system of the virtual world. The vertical rotation angle is an angle rotated when the user-generated content is rotated along a vertical plane in the virtual world's stereo coordinate system. The stereo rotation parameter can be configured together when the user generated content is created, and corresponds to the user generated content storage.
上述实施例中,用户可以在创建用户生成内容时配置用户生成内容的立体旋转参数,从而在播放反映现实世界的图像帧时,就可以展示按照该立体旋转参数旋转后的用户生成内容,提供了新的互动方式。In the above embodiment, the user may configure a stereo rotation parameter of the user-generated content when creating the user-generated content, so that when the image frame reflecting the real world is played, the user-generated content rotated according to the stereo rotation parameter may be displayed, and New ways of interacting.
在一个实施例中,步骤S312包括:在播放的图像帧中追踪模板图像中的物体区域;根据展示位置和追踪到的物体区域确定追踪渲染位置;在播放的图像帧中按照追踪渲染位置渲染用户生成内容。In an embodiment, step S312 includes: tracking an object region in the template image in the played image frame; determining a tracking rendering position according to the display position and the tracked object region; rendering the user according to the tracking rendering position in the played image frame Generate content.
其中,追踪是指在连续播放的图像帧中定位物体区域的变化。物体区域的变化如位置的变化和/或形态的变化。追踪渲染位置是指用户生成内容在播放的图像帧中的实时渲染位置。选取的图像帧与模板图像匹配,终端可以将 模板图像的图像区域作为选取的图像帧中的物体区域,进而在播放的图像帧中追踪物体区域。Among them, tracking refers to the change of the location of the object in the continuously played image frame. Changes in the area of the object such as changes in position and/or changes in morphology. Tracking the rendering position refers to the real-time rendering position of the user-generated content in the played image frame. The selected image frame is matched with the template image, and the terminal can use the image region of the template image as the object region in the selected image frame, and then track the object region in the played image frame.
展示位置可以表示待显示的用户生成内容在展示时相对于模板图像中的物体区域的位置,根据该展示位置和追踪到的物体区域的位置变化,就可以确定用户生成内容的追踪渲染位置。The placement position may indicate the position of the user-generated content to be displayed relative to the object area in the template image at the time of presentation, and the tracking rendering position of the user-generated content may be determined according to the display position and the positional change of the tracked object area.
进一步地,根据展示位置和踪到的物体区域的形态变化,便可以确定用户生成内容的追踪渲染形态,从而可在播放的图像帧中,按照追踪渲染位置和追踪渲染形态,渲染用户生成内容。追踪渲染形态可以用实时的立体旋转参数表示。Further, according to the display position and the morphological change of the tracked object area, the tracking rendering form of the user-generated content can be determined, so that the user-generated content can be rendered in the played image frame according to the tracking rendering position and the tracking rendering mode. The tracking rendering pattern can be represented by real-time stereo rotation parameters.
上述实施例中,在播放的图像帧中追踪模板图像中的物体区域,从而在播放的图像帧中根据追踪到的物体区域对用户生成内容进行追踪渲染,实现了虚拟世界中的用户生成内容和现实世界中物体的强关联,实现了虚拟世界和现实世界之间基于用户生成内容的全新互动方式。In the above embodiment, the object region in the template image is tracked in the played image frame, so that the user generated content is tracked and rendered according to the tracked object region in the played image frame, thereby realizing user generated content in the virtual world and The strong association of objects in the real world enables a new way of interacting between user-generated content between the virtual world and the real world.
在一个实施例中,终端可在播放的图像帧中追踪模板图像中的物体区域;检测追踪到的物体区域相对于模板图像中的物体区域的形态变化;根据形态变化确定表示观察方向的参数;根据展示位置,在播放的图像帧中渲染按照表示观察方向的参数变形后的用户生成内容。In an embodiment, the terminal may track an object region in the template image in the played image frame; detect a morphological change of the tracked object region relative to the object region in the template image; and determine a parameter indicating the viewing direction according to the morphological change; According to the display position, the user-generated content deformed according to the parameter indicating the viewing direction is rendered in the played image frame.
本实施例中,当对现实世界中的物体的观察方向发生变化时,通过检测追踪到的物体区域相对于模板图像中的物体区域的形态变化,可以确定表示该观察方向的参数。按照该参数对用户生成内容进行变形,使得变形后的用户生成内容可以表示出观察方向的变化,实现了虚拟世界中的用户生成内容和现实世界中物体的强关联,实现了虚拟世界和现实世界之间基于用户生成内容的全新互动方式。In the present embodiment, when the observation direction of the object in the real world changes, the parameter indicating the observation direction can be determined by detecting the morphological change of the tracked object region with respect to the object region in the template image. The user-generated content is deformed according to the parameter, so that the transformed user-generated content can represent the change of the observation direction, realizing the strong association between the user-generated content in the virtual world and the object in the real world, realizing the virtual world and the real world. A new way to interact based on user-generated content.
在一个实施例中,步骤S308包括:获取与选取的图像帧匹配的模板图像所关联的多个内容创建者信息及相对应的用户生成内容。步骤S312包括:展示多个内容创建者信息;选中多个内容创建者信息中的一个;按照选中的一个内容创建者信息相对应的展示位置,在播放的图像帧中渲染相对应的用户 生成内容。In one embodiment, step S308 includes: acquiring a plurality of content creator information associated with the template image that matches the selected image frame and corresponding user generated content. Step S312 includes: displaying a plurality of content creator information; selecting one of the plurality of content creator information; and rendering the corresponding user-generated content in the played image frame according to the selected placement position corresponding to the content creator information .
其中,内容创建者信息是指用户生成内容的内容创建者的身份信息,可以是相应内容创建者的用户头像、用户昵称或者用户账号等。同一个模板图像可以关联多于一个的用户生成内容,每个用户生成内容对应一个内容创建者信息,于是一个用户生成内容可以关联多个内容创建者信息。The content creator information refers to the identity information of the content creator of the user-generated content, and may be a user avatar, a user nickname, or a user account of the corresponding content creator. The same template image may be associated with more than one user generated content, each user generated content corresponding to one content creator information, such that one user generated content may be associated with multiple content creator information.
多个内容创建者信息的数量取决于同一个模板图像所关联的用户生成内容的内容创建者的数量。每个内容创建者信息对应一个用户生成内容,每个用户生成内容对应一个展示位置,终端可按照选中的用户生成内容对应的展示位置,在播放的图像帧中渲染相对应的用户生成内容。The number of multiple content creator information depends on the number of content creators of user generated content associated with the same template image. Each content creator information corresponds to one user-generated content, and each user-generated content corresponds to one display location, and the terminal may render corresponding user-generated content in the played image frame according to the selected display position corresponding to the user-generated content.
上述实施例中,一个模板图像可以关联多个内容创建者所创建的用户生成内容,扩充了现实世界中的物体所能够关联的用户生成内容的数量;用户可以在多个内容创建者所创建的用户生成内容之间切换,扩展了用于基于虚拟世界和现实世界进行交互的维度。In the above embodiment, one template image may be associated with user generated content created by a plurality of content creators, and the number of user generated content that can be associated with objects in the real world is expanded; the user may be created by multiple content creators. Switching between user-generated content extends the dimensions used to interact based on the virtual world and the real world.
参照图4,下面用一个具体应用场景来说明上述用户生成内容处理方法的原理。用户可进入社交应用,社交应用展示如图5所示的主页面。用户可以点击主页面中的工具菜单触发按钮502,使得社交应用在如图6所示的主页面中展示工具菜单601,工具菜单601包括功能入口602。用户点击功能入口602,使得社交应用开始从现实世界采集图像帧,并将采集的图像帧按照采集的时序逐帧播放,参照图7左,终端形成反映现实世界的实时动态画面。Referring to FIG. 4, the principle of the above-described user generated content processing method will be described below with a specific application scenario. The user can enter the social application, and the social application displays the main page as shown in FIG. The user can click on the tool menu trigger button 502 in the main page to cause the social application to display the tool menu 601 in the main page as shown in FIG. 6, which includes the function entry 602. The user clicks on the function entry 602, so that the social application starts to collect image frames from the real world, and the captured image frames are played frame by frame according to the collected timing. Referring to FIG. 7 left, the terminal forms a real-time dynamic picture reflecting the real world.
终端在播放图像帧时,若图像帧在预设时长内基本未发生变化,则选取当前播放的图像帧,判断选取的图像帧是否符合预设的模板图像特征。当选取的图像帧不符合模板图像特征时,提示用户未识别到物体,继续采集并播放图像帧。当选取的图像帧符合模板图像特征时,终端进一步判断本地是否缓存有与选取的图像帧匹配的模板图像。When the terminal plays an image frame, if the image frame does not change substantially within the preset duration, the currently played image frame is selected to determine whether the selected image frame conforms to the preset template image feature. When the selected image frame does not conform to the template image feature, the user is prompted to not recognize the object, and the image frame is continuously collected and played. When the selected image frame conforms to the template image feature, the terminal further determines whether the template image matching the selected image frame is cached locally.
当本地缓存有与选取的图像帧匹配的模板图像时,终端拉取与该模板图像关联的由多个内容创建者各自创建的用户生成内容、相应的内容创建者头像以及相应的展示位置,从而如图8左所示,在当前播放的视频帧上,展示 内容创建者头像列表801。用户选中内容创建者头像列表中的一个内容创建者头像801a,使得社交应用按照与该选中的内容创建者头像相应的展示位置,展示相应的用户生成内容802和803。When the template image matching the selected image frame is locally cached, the terminal pulls the user generated content, the corresponding content creator avatar, and the corresponding display position respectively created by the plurality of content creators associated with the template image, thereby As shown on the left of FIG. 8, on the currently played video frame, a content creator avatar list 801 is displayed. The user selects a content creator avatar 801a in the content creator avatar list such that the social application displays the corresponding user generated content 802 and 803 according to the placement corresponding to the selected content creator avatar.
如果模板图像配置了立体旋转角度,则终端会按照该立体旋转角度将用户生成内容802和803按照该立体旋转角度变形后展示。当物体区域(如酒杯和水杯)在播放的图像帧中变化时,用户生成内容802和803会跟着变化。当物体区域的观察角度变化时,用户生成内容802和803也会进行相应的旋转。If the template image is configured with a stereo rotation angle, the terminal will display the user generated content 802 and 803 according to the stereo rotation angle according to the stereo rotation angle. When object areas (such as wine glasses and cups) change in the image frame being played, the user generated content 802 and 803 will change. When the viewing angle of the object area changes, the user generated content 802 and 803 also rotate accordingly.
用户可以在如图8左所示的页面中进行向上滑动的操作,进入对于当前所展示的用户生成内容的评论页面,如图9所示,用户可在该评论页面中添加评论内容或者评论回复内容。The user can perform an upward sliding operation in the page shown on the left of FIG. 8 to enter a comment page for the currently displayed user generated content. As shown in FIG. 9, the user can add a comment content or a comment reply in the comment page. content.
当本地未缓存与选取的图像帧匹配的模板图像时,终端将选取的图像帧上传至服务器,由服务器为上传的图像帧匹配模板图像。若服务器查询到匹配的模板图像,终端可拉取与该模板图像关联的由多个内容创建者各自创建的用户生成内容、相应的内容创建者头像以及相应的展示位置,从而如图8左所示,在当前播放的视频帧上,展示内容创建者头像列表801。用户选中内容创建者头像列表中的一个内容创建者头像801a,使得社交应用按照与该选中的内容创建者头像相应的展示位置,展示相应的用户生成内容802和803。When the template image matching the selected image frame is not cached locally, the terminal uploads the selected image frame to the server, and the server matches the template image for the uploaded image frame. If the server queries the matching template image, the terminal may pull the user-generated content, the corresponding content creator avatar, and the corresponding display position respectively created by the multiple content creators associated with the template image, so as shown in FIG. 8 The content creator avatar list 801 is displayed on the currently played video frame. The user selects a content creator avatar 801a in the content creator avatar list such that the social application displays the corresponding user generated content 802 and 803 according to the placement corresponding to the selected content creator avatar.
若服务器未查询到匹配的模板图像,终端可展示如图10所示的内容创建入口1001,用户点击内容创建入口1001后,用户可选择图片和/或输入文字,还可以在如图11所示的图片编辑页面中对图片进行编辑,如进行立体旋转,还可以设定是否仅对好友可见的访问权限,确认后创建用户生成内容,并将用户生成内容上传至服务器,由服务器将上传的用户生成内容与由上传的图像帧注册而成的模板图像关联存储。如果上传用户生成内容失败,则社交应用将提示出错,并进入用于重新上传用户生成内容的发件箱。If the server does not query the matching template image, the terminal may display the content creation portal 1001 as shown in FIG. 10. After the user clicks the content creation portal 1001, the user may select a picture and/or input text, and may also be as shown in FIG. In the photo editing page, the image is edited. For example, if the stereo rotation is performed, whether the access right visible only to the friend can be set, the user generated content is created after confirmation, and the user generated content is uploaded to the server, and the user uploaded by the server is uploaded. The generated content is stored in association with the template image registered by the uploaded image frame. If uploading user-generated content fails, the social app will prompt an error and enter the outbox for re-uploading user-generated content.
另一方面,随着计算机技术的发展,图像处理技术也不断进步。用户可 以通过专业的图像处理软件对图像进行处理,使得经过处理的图像表现更好。用户还可以通过图像处理软件,在图像中附加由图像处理软件提供的素材,让经过处理的图像能够传递更多的信息。然而,目前的图像处理方式,需要用户展开图像处理软件的素材库,浏览素材库,从素材库中选择合适的素材,调整素材在图像中的位置,从而确认修改,完成图像处理。于是目前的图像处理方式需要大量的人工操作,耗时长,导致图像处理过程效率低。On the other hand, with the development of computer technology, image processing technology has also been continuously improved. Users can process images with professional image processing software to make processed images perform better. The user can also attach the material provided by the image processing software to the image through the image processing software, so that the processed image can transmit more information. However, the current image processing method requires the user to expand the material library of the image processing software, browse the material library, select the appropriate material from the material library, adjust the position of the material in the image, thereby confirming the modification and completing the image processing. Therefore, the current image processing method requires a large number of manual operations, which takes a long time, resulting in low efficiency of the image processing process.
基于此,前述实施例中的用户生成内容处理方法还可包括人脸图像处理的步骤,以通过执行该人脸图像处理的步骤来提高图像处理的效率。Based on this, the user-generated content processing method in the foregoing embodiment may further include a step of face image processing to improve the efficiency of image processing by performing the step of the face image processing.
在一个实施例中,终端从采集的图像帧中选取图像帧后,可检测选取的图像帧中是否包括人脸图像,在选取的图像帧中包括人脸图像区域时可继续执行上述实施例中S306之后的步骤,还可以执行人脸图像处理的步骤。In an embodiment, after selecting an image frame from the collected image frames, the terminal may detect whether a face image is included in the selected image frame, and may continue to execute in the foregoing embodiment when the selected image frame includes the face image region. Steps subsequent to S306, the step of face image processing can also be performed.
如图12所示,在一个实施例中,当选取的图像帧中包括人脸图像区域时,用户生成内容处理方法包括的人脸图像处理的步骤具体可包括以下步骤。这些步骤具体可在S306之后执行。As shown in FIG. 12, in one embodiment, when the face image area is included in the selected image frame, the step of the face image processing included in the user-generated content processing method may specifically include the following steps. These steps can be specifically performed after S306.
S1202,获取识别图像帧中包括的人脸图像得到的人脸情感特征识别结果。S1202: Acquire a face emotion feature recognition result obtained by identifying a face image included in the image frame.
其中,情感特征是反应人或动物情感的特征。情感特征是计算机可识别和处理的特征。情感特征比如开心、忧郁或者愤怒等。人脸情感特征是指通过人脸表情反映的情感特征。Among them, the emotional characteristics are characteristics that reflect the emotions of humans or animals. Emotional features are features that are identifiable and processed by the computer. Emotional characteristics such as happiness, depression or anger. The facial emotion feature refers to the emotional feature reflected by the facial expression.
在一个实施例中,终端可在从现实场景中采集图像帧时,检测采集到的图像帧中是否包括人脸图像。若终端在判定采集的图像帧中包括人脸图像时,则对该图像帧中包括的人脸图像进行表情识别,获取识别得到的人脸情感特征识别结果。In an embodiment, the terminal may detect whether a face image is included in the collected image frame when the image frame is acquired from the real scene. If the terminal includes the face image in the image frame that is determined to be collected, the face image included in the image frame is subjected to expression recognition, and the recognized face emotion feature recognition result is obtained.
在一个实施例中,终端可在通过摄像头,在摄像头当前的视野下,采集现实场景的图像帧后,提取采集的图像帧中包括的图像数据,并检测该图像数据是否包含人脸特征数据。若终端检测到该图像数据中包含人脸特征数据,则判定该图像帧中包括人脸图像。终端可进一步从人脸特征数据中提取表情 特征数据,根据提取的表情特征数据,在本地对采集的图像帧中包括的人脸图像进行表情识别,得到人脸情感特征识别结果。其中,表情特征数据可以是用于反映出人脸的轮廓、眼镜、鼻子、嘴以及各个脸部器官之间的距离等其中的一种或多种特征信息。In one embodiment, the terminal may extract image data included in the captured image frame after capturing an image frame of the real scene through the camera in the current field of view of the camera, and detect whether the image data includes facial feature data. If the terminal detects that the face data is included in the image data, it is determined that the image frame includes a face image. The terminal may further extract expression feature data from the facial feature data, and perform facial expression recognition on the facial image included in the collected image frame according to the extracted facial expression feature data to obtain a facial emotion feature recognition result. The expression feature data may be one or more pieces of feature information for reflecting the contour of the face, the glasses, the nose, the mouth, and the distance between the respective facial organs.
举例说明,人们在感觉到开心的时候,嘴角都会上扬,若终端在图像帧中包括人脸特征数据提取出的表情特征数据为嘴角上扬,则可表示该图像帧中人脸反映的情感特征为开心。人们在感觉到惊讶的时候,嘴张开的幅度较大,若终端在图像帧中包括人脸特征数据提取出的表情特征数据为嘴张开幅度较大,则可表示该图像帧中人脸反映的情感特征为惊讶。For example, when people feel happy, the corners of the mouth will rise. If the expression data extracted by the terminal including the facial feature data in the image frame is raised in the corner of the mouth, it can indicate that the emotional feature reflected by the face in the image frame is Happy. When people feel surprised, the mouth opens a large amount. If the terminal extracts the feature data extracted from the face feature data in the image frame, the mouth opening amplitude is large, and the face in the image frame can be represented. The emotional characteristics reflected are astonished.
在一个实施例中,终端也可将检测得到的包括人脸图像的图像帧发送至服务器,服务器在接收到终端发送的图像帧后,对该图像帧中包括的人脸图像进行表情识别得到人脸情感特征识别结果,再将识别得到的人脸情感特征识别结果反馈至终端,终端获取服务器返回的人脸情感特征识别结果。In an embodiment, the terminal may also send the detected image frame including the face image to the server, and after receiving the image frame sent by the terminal, the server performs expression recognition on the face image included in the image frame. The face emotion feature recognition result is fed back to the terminal, and the terminal obtains the face emotion feature recognition result returned by the server.
在一个实施例中,终端也可在接收到另一终端发送的从现实场景中采集的图像帧后,检测接收到的图像帧中是否包括人脸图像。若终端在判定接收到的图像帧中包括人脸图像时,可在本地对该图像帧中包括的人脸图像进行表情识别,得到相应的人脸情感特征识别结果;也可将该图像帧发送至服务器,使得服务器在对该图像帧中包括的人脸图像进行识别后,返回人脸情感特征识别结果。In an embodiment, the terminal may also detect whether the face image is included in the received image frame after receiving the image frame acquired from the real scene sent by the other terminal. If the terminal includes the face image in the received image frame, the face image included in the image frame may be locally recognized by the face to obtain a corresponding face emotion feature recognition result; the image frame may also be sent. To the server, the server returns the face emotion feature recognition result after identifying the face image included in the image frame.
S1204,根据人脸情感特征识别结果,查找相应的情感特征图像。S1204: Search for a corresponding emotional feature image according to the facial emotion feature recognition result.
其中,情感特征图像是指能反映情感特征的图像。反映伤心的情感特征图像比如包括眼泪的图像或者包括下雨场景的图像等。反映愤怒的情感特征图像比如包括火焰的图像等。情感特征图像可以是终端从互联网上爬取的图像,也可以是终端根据通过该终端包括的摄像设备拍摄的图像。情感特征图像可以是动态图片,也可以是静态图片。Among them, the emotional feature image refers to an image that reflects the emotional feature. Images that reflect sad emotional features such as images including tears or images including rainy scenes. An image of an emotional feature that reflects anger, such as an image including a flame. The emotion feature image may be an image that the terminal climbs from the Internet, or may be an image that the terminal captures according to an imaging device included through the terminal. The emotional feature image can be a dynamic image or a static image.
在一个实施例中,终端可事先选定可进行图像处理的情感特征,并对应于选定的情感特征配置相应的情感特征图像。终端在获取到人脸情感特征识 别结果后,获取人脸情感特征识别结果表征的情感特征相应的情感特征图像。In one embodiment, the terminal may select an emotional feature that can be image processed in advance, and configure a corresponding emotional feature image corresponding to the selected emotional feature. After obtaining the facial emotion feature recognition result, the terminal acquires the emotional feature image corresponding to the emotional feature represented by the facial emotion feature recognition result.
在一个实施例中,终端可事先建立情感特征图像库,并将情感特征图像库中反映相同情感特征的情感特征图像映射至相同的情感特征。终端在获取到人脸情感特征识别结果后,可在情感特征图像库中查找反映的情感特征与人脸情感特征识别结果匹配的情感特征图像。In one embodiment, the terminal may previously create an emotional feature image library and map the emotional feature images in the emotional feature image library that reflect the same emotional features to the same emotional features. After obtaining the facial emotion feature recognition result, the terminal may search for the emotional feature image that matches the reflected emotional feature and the facial emotional feature recognition result in the emotional feature image database.
在一个实施例中,终端事先建立的情感特征图像库可以包括多个情感特征图像集合,每个情感特征图像集合反映一种情感特征。终端在获取到人脸情感特征识别结果后,查找情感特征图像库中反映的情感特征与人脸情感特征识别结果一致的情感特征图像集合,从查找到的情感特征图像集合中选取情感特征图像。In one embodiment, the library of emotional feature images previously established by the terminal may include a plurality of sets of emotional feature images, each set of emotional features reflecting an emotional feature. After obtaining the facial emotion feature recognition result, the terminal searches for the emotional feature image set that is consistent with the facial feature recognition result reflected in the emotional feature image database, and selects the emotional feature image from the found emotional feature image set.
S1206,获取情感特征图像在当前播放的图像帧中的展示位置。S1206: Acquire a display position of the emotional feature image in the currently played image frame.
其中,情感特征图像在当前播放的图像帧中的展示位置,该展示位置表示情感特征图像在当前播放的图像帧中所占的区域。展示位置可以由情感特征图像在当前播放的图像帧中所占的区域在当前播放的图像帧中的坐标系中的坐标表示。Wherein the display position of the emotional feature image in the currently played image frame, the display position represents an area occupied by the emotional feature image in the currently played image frame. The placement location may be represented by the coordinates of the region of the emotional feature image that is occupied in the currently played image frame in the coordinate system in the currently played image frame.
在一个实施例中,终端可在查找情感特征图像时,一并获取该情感特征图像的展示位置。终端具体可从本地获取查找到的情感特征图像对应的绘制方式,根据获取的绘制方式确定情感特征图像的展示位置。In an embodiment, the terminal may acquire the display position of the emotional feature image together when searching for the emotional feature image. The terminal specifically obtains the corresponding drawing manner of the found emotional feature image from the local, and determines the display position of the emotional feature image according to the obtained drawing manner.
进一步地,情感特征图像的绘制方式可以是动态跟随参照物。具体地,终端可确定查找到的情感特征图像需动态跟随的参照物在当前播放的图像帧中的显示位置,再根据参照物的显示位置确定情感特征图像在当前播放的图像帧中的展示位置。Further, the manner in which the emotional feature image is drawn may be a dynamic follow reference. Specifically, the terminal may determine a display position of the reference object that the searched emotional feature image needs to dynamically follow in the currently played image frame, and then determine a display position of the emotional feature image in the currently played image frame according to the display position of the reference object. .
情感特征图像的绘制方式也可以是静态展示。具体地,对于静态展示的情感特征图像,终端可事先直接设置该情感特征图像在当前播放的图像帧中的展示区域,终端在需要绘制该情感特征图像便可直接获取。The way in which the emotional feature image is drawn can also be a static display. Specifically, for the static feature image displayed by the terminal, the terminal may directly set the display area of the emotion feature image in the currently played image frame, and the terminal may directly acquire the emotion feature image.
S1208,按照展示位置,在当前播放的图像帧中渲染情感特征图像。S1208: Render an emotional feature image in the currently played image frame according to the display position.
具体地,终端可在当前播放的图像帧中,在获取的展示位置处渲染情感 特征图像。终端可获取情感特征图像对应的样式数据,从而按照该样式数据和获取的展示位置,在播放的图像帧中渲染情感特征图像。在一个实施例中,情感特征图像为包括一组图像帧序列的动态图像。终端可按照动态图像对应的帧率和展示位置,逐个渲染动态图像包括的图像帧。Specifically, the terminal may render the emotional feature image at the acquired display position in the currently played image frame. The terminal may acquire the style data corresponding to the emotion feature image, so that the emotion feature image is rendered in the played image frame according to the style data and the acquired display position. In one embodiment, the emotional feature image is a dynamic image that includes a sequence of image frames. The terminal may render the image frames included in the dynamic image one by one according to the frame rate and the display position corresponding to the dynamic image.
在一个实施例中,展示位置可以是情感特征图像相对于当前播放的图像帧中某一特定区域的位置;终端可在播放的图像帧中追踪该特定区域,从而按照该展示位置和追踪到的特定区域,确定当前播放的图像帧中情感特征图像相对于追踪到的特定区域的位置,从而按照确定的位置渲染情感特征图像。其中,特定区域是图像中可表示现实场景中特定的区域,该特定的区域可以是人脸区域等。In one embodiment, the placement location may be the location of the emotional feature image relative to a particular region of the currently played image frame; the terminal may track the particular region in the played image frame, thereby following the placement and tracking The specific area determines the position of the emotional feature image in the currently played image frame relative to the tracked specific area, thereby rendering the emotional feature image according to the determined position. The specific area is a specific area in the image that can represent a real scene, and the specific area may be a face area or the like.
上述用户生成内容处理方法,将反映现实场景的图像帧播放,使得播放的图像帧能够反映现实场景。获取识别图像帧中包括的人脸图像得到的人脸情感特征识别结果,就可以自动地确定现实场景中的人物的情感状况。获取到情感特征图像在当前播放的图像帧中的展示位置后,按照该展示位置,在当前播放的图像帧中渲染情感特征图像,就可以自动地将虚拟的情感特征图像与现实场景中人物相结合,反映现实场景中人物的情感状况。因避免了人工操作的繁琐步骤,极大地提高了图像处理效率。The user-generated content processing method plays an image frame reflecting a real scene, so that the played image frame can reflect the real scene. Obtaining the facial emotion feature recognition result obtained by identifying the face image included in the image frame can automatically determine the emotional state of the person in the real scene. After obtaining the display position of the emotional feature image in the currently played image frame, according to the display position, rendering the emotional feature image in the currently played image frame, the virtual emotional feature image can be automatically compared with the character in the real scene. Combine to reflect the emotional state of the characters in the real scene. The image processing efficiency is greatly improved by avoiding the cumbersome steps of manual operation.
在一个实施例中,步骤S1202具体包括:调整图像帧的尺寸至预设尺寸;将调整后的图像帧的方向旋转至符合情感特征识别条件的方向;发送旋转后的图像帧至服务器;接收服务器返回的针对发送的图像帧的人脸情感特征识别结果。In an embodiment, step S1202 specifically includes: adjusting a size of the image frame to a preset size; rotating the direction of the adjusted image frame to a direction conforming to the emotional feature recognition condition; transmitting the rotated image frame to the server; receiving the server The returned face emotion feature recognition result for the transmitted image frame.
其中,预设尺寸是指预先设置的图像帧的尺寸。符合情感特征识别条件的方向是指图像帧可进行情感特征识别时的方向。The preset size refers to the size of a preset image frame. The direction that conforms to the emotional feature recognition condition refers to the direction in which the image frame can perform emotional feature recognition.
在一个实施例中,终端可从服务器拉取预设的包括人脸图像的图像帧的图像特征,该图像特征是指可进行表情识别的图像帧应当具备的特征。比如,图像帧的尺寸或者图像帧的方向等。In one embodiment, the terminal may pull a preset image feature of the image frame including the face image from the server, the image feature being a feature that the image frame that can perform the face recognition may have. For example, the size of an image frame or the direction of an image frame, and the like.
具体地,终端在获取从现实场景中采集的图像帧,并挑选出包括人脸图 像的图像帧后,可检测筛选出来的包括人脸图像的图像帧的尺寸是否符合预设尺寸。若检测筛选出来的包括人脸图像的图像帧的尺寸不符合预设尺寸,则对该图像帧进行尺寸调整。Specifically, after acquiring the image frame collected from the real scene and selecting the image frame including the face image, the terminal may detect whether the size of the selected image frame including the face image conforms to the preset size. If the size of the image frame including the face image selected and detected does not conform to the preset size, the image frame is resized.
终端可在检测筛选出来的包括人脸图像的图像帧的尺寸符合预设尺寸或者在对不符合的图像帧调整尺寸后,检测图像帧当前的方向。若图像帧当前的方向不符合情感特征识别条件,则旋转图像帧的方向至符合情感特征识别条件的方向。The terminal may detect the current direction of the image frame after detecting that the size of the filtered image frame including the face image conforms to a preset size or after resizing the non-compliant image frame. If the current direction of the image frame does not conform to the emotional feature recognition condition, the direction of the image frame is rotated to a direction that conforms to the emotional feature recognition condition.
终端可在图像帧当前的方向符合情感特征识别条件或者在对不符合的图像帧旋转方向后,将图像帧发送至服务器。服务器在接收到终端发送的图像帧后,提取图像帧中包括的表情特征数据,根据提取的表情特征数据,对接收到的图像帧中包括的人脸图像进行表情识别,得到人脸情感特征识别结果,再将识别得到的人脸情感特征识别结果反馈至终端。The terminal may transmit the image frame to the server after the current direction of the image frame conforms to the emotional feature recognition condition or after the direction of the non-conforming image frame is rotated. After receiving the image frame sent by the terminal, the server extracts the expression feature data included in the image frame, and performs facial expression recognition on the face image included in the received image frame according to the extracted expression feature data to obtain facial emotion feature recognition. As a result, the recognized facial emotion feature recognition result is fed back to the terminal.
在一个实施例中,终端在获取从现实场景中采集的图像帧,并挑选出包括人脸图像的图像帧后,可将图像帧进行缩小尺寸处理,并将缩小处理后的图像帧保存为JPEG(Joint Photographic Experts Group联合图像专家小组)格式。终端可再检测图像帧中包括的人脸图像的方向,并在图像帧中包括的人脸图像的方向不符合情感特征识别条件的方向时,旋转图像帧的方向。In an embodiment, after acquiring an image frame acquired from a real scene and picking out an image frame including a face image, the terminal may perform image reduction processing on the image frame, and save the reduced image frame as a JPEG. (Joint Photographic Experts Group). The terminal may re-detect the direction of the face image included in the image frame, and rotate the direction of the image frame when the direction of the face image included in the image frame does not conform to the direction of the emotion feature recognition condition.
其中,JPEG格式是指按照国际图像压缩标准压缩得到的图像格式。符合情感特征识别条件的方向具体可以是图像帧中人脸图像的中轴线与竖直方向的夹角不大于45度时的方向。Among them, the JPEG format refers to an image format compressed according to the international image compression standard. The direction conforming to the emotional feature recognition condition may specifically be a direction when the angle between the central axis of the face image and the vertical direction in the image frame is not more than 45 degrees.
上述实施例中,在通过服务器对图像帧中人脸图像进行表情识别前,调整图像帧的尺寸和方向,使得图像帧符合进行表情识别的条件,可提高表情识别速度与准确性,还可减少硬件资源消耗。In the above embodiment, before the expression recognition of the face image in the image frame by the server, the size and direction of the image frame are adjusted, so that the image frame conforms to the condition for performing expression recognition, which can improve the speed and accuracy of the expression recognition, and can also reduce Hardware resource consumption.
在一个实施例中,该图像处理方法还包括:提取采集图像帧时录制的语音数据;获取识别语音数据得到的语音情感特征识别结果。该步骤具体可在S1204之前执行。步骤S1204具体包括:根据人脸情感特征识别结果和语音情感特征识别结果,查找相应的情感特征图像。In an embodiment, the image processing method further includes: extracting voice data recorded when the image frame is acquired; and acquiring a voice emotion feature recognition result obtained by identifying the voice data. This step can be specifically performed before S1204. Step S1204 specifically includes: searching for a corresponding emotional feature image according to the facial emotion feature recognition result and the voice emotion feature recognition result.
具体地,终端在从现实场景中采集图像帧时,可同时录制现实场景中的语音数据,并在播放采集的图像帧时,同步播放录制的语音数据。终端具体可调用声音采集装置采集环境声音形成的语音数据,将语音数据对应于采集时间存入缓存区。Specifically, when the terminal collects an image frame from a real scene, the terminal can simultaneously record the voice data in the real scene, and synchronously play the recorded voice data when the captured image frame is played. The terminal specifically can call the sound collection device to collect the voice data formed by the ambient sound, and store the voice data in the buffer area corresponding to the collection time.
终端可在对采集的图像帧中包括的人脸图像进行表情识别时,提取当前进行表情识别的图像帧所对应的采集时间,从缓存区的语音数据中截取预设时间长度的语音数据片段,且提取的语音数据片段对应的采集时间区间覆盖获取的采集时间。提取的语音数据片段即为采集该图像帧时录制的语音数据。其中,预设时间长度是预先设置的截取语音数据片段的时间长度,预设时间长度具体可以是5秒或者10秒等。When performing facial expression recognition on the face image included in the collected image frame, the terminal may extract the acquisition time corresponding to the image frame currently performing the expression recognition, and intercept the voice data segment of the preset time length from the voice data of the buffer area. And the collected time interval corresponding to the extracted voice data segment covers the acquired acquisition time. The extracted voice data segment is the voice data recorded when the image frame is acquired. The preset time length is a preset length of the intercepted voice data segment, and the preset time length may be 5 seconds or 10 seconds.
在一个实施例中,终端可从缓存区的语音数据中以获取的采集时间为中点,截取预设时间长度的语音数据片段。比如,当前进行表情识别的图像帧所对应的采集时间为2016年10月1日18时30分15秒,预设时间长度为5秒,那么可以2016年10月1日18时30分15秒为中点,截取采集时间区间为2016年10月1日18时30分13秒至2016年10月1日18时30分17秒的语音数据片段。In one embodiment, the terminal may intercept the voice data segment of the preset time length from the voice data of the buffer area with the acquired acquisition time as a midpoint. For example, the acquisition time of the image frame currently performing expression recognition is 18:30:15 on October 1, 2016, and the preset time is 5 seconds, then it can be 18:30:15 on October 1, 2016. For the midpoint, intercept the speech data segment from 18:30:13 on October 1, 2016 to 18:30:17 on October 1, 2016.
在一个实施例中,终端在接收另一终端发送的从现实场景中采集的图像帧时,也可接收另一终端发送的在采集图像帧时录制的语音数据。终端可将接收的语音数据存入缓存区,在按采集的时序播放图像帧时,将语音数据取出同步播放。In an embodiment, when receiving the image frame collected by the other terminal from the real scene, the terminal may also receive the voice data recorded by the other terminal when the image frame is acquired. The terminal can store the received voice data into the buffer area, and when the image frame is played according to the collected timing, the voice data is taken out and played synchronously.
终端可在对接收到的图像帧中包括的人脸图像进行表情识别时,提取当前进行表情识别的图像帧所对应的采集时间,从缓存区的语音数据中截取预设时间长度的语音数据片段,且提取的语音数据片段对应的采集时间区间覆盖获取的采集时间。提取的语音数据片段即为采集该图像帧时录制的语音数据The terminal may extract the face time corresponding to the image frame currently performing the face recognition when performing the face recognition on the face image included in the received image frame, and intercept the voice data segment of the preset time length from the voice data of the buffer area. And the collected time interval corresponding to the extracted voice data segment covers the acquired acquisition time. The extracted voice data segment is the voice data recorded when the image frame is acquired.
终端在获取采集当前进行表情识别的图像帧时录制的语音数据后,对获取的该语音数据进行识别,得到语音情感特征识别结果。After acquiring the voice data recorded when the image frame currently performing the expression recognition is acquired, the terminal identifies the obtained voice data to obtain a voice emotion feature recognition result.
在一个实施例中,图像处理方法中获取识别语音数据得到的语音情感特征识别结果的步骤具体包括:将提取的语音数据识别为文本;查找文本所包括的情感特征关键字;根据查找到的情感特征关键字,获取与语音数据相对应的语音情感特征识别结果。In an embodiment, the step of acquiring the speech emotion feature recognition result obtained by recognizing the speech data in the image processing method comprises: recognizing the extracted speech data as text; searching for an emotional feature keyword included in the text; according to the found emotion The feature keyword acquires a speech emotion feature recognition result corresponding to the voice data.
具体地,终端可对语音数据进行特征提取,获得待识别的语音特征数据,然后基于声学模型对待识别的语音特征数据进行语音分帧处理得到多个音素,根据候选字库中候选字与音素的对应关系,将处理得到的多个音素转化为字符序列,再利用语言模型调整转化得到的字符序列,从而得到符合自然语言模式的文本。Specifically, the terminal may perform feature extraction on the voice data, obtain the voice feature data to be identified, and then perform voice segmentation processing on the voice feature data to be recognized based on the acoustic model to obtain a plurality of phonemes, according to the correspondence between the candidate words and the phonemes in the candidate font library. The relationship converts the processed plurality of phonemes into a sequence of characters, and then uses the language model to adjust the transformed character sequence to obtain text conforming to the natural language mode.
其中,文本是语音数据的字符表示形式。声学模型如GMM(Gaussian Mixture Model高斯混合模型)或DNN(Deep Neural Network深度神经网络)等。候选字库包括候选字和与候选字对应的音素。语言模型用于按照自然语言模式调整声学模型所识别出的字符序列,比如N-Gram模型(CLM,Chinese Language Model汉语语言模型)等。Among them, the text is a character representation of the voice data. Acoustic models such as GMM (Gaussian Mixture Model) or DNN (Deep Neural Network). The candidate font includes a candidate word and a phoneme corresponding to the candidate word. The language model is used to adjust the sequence of characters recognized by the acoustic model according to the natural language mode, such as the N-Gram model (CLM, Chinese Language Model).
终端可事先设置情感特征关键字库,情感特征关键字库包括若干情感特征关键字,并将情感特征关键字库中反映相同情感特征的情感特征关键字映射至相同的情感特征。情感特征关键字库可存储在文件、数据库或者缓存中,在需要时从文件、数据库或者缓存中获取。终端在将提取的语音数据识别为文本后,将识别得到的文本中包括的字符与情感特征关键字库中各情感特征关键字比较。当文本中存在字符与情感特征关键字库中情感特征关键字匹配时,获取匹配的情感特征关键字,获取该情感特征关键字对应的情感特征为语音情感特征识别结果。The terminal may set an emotional feature keyword library in advance, and the emotional feature keyword library includes a plurality of emotional feature keywords, and maps the emotional feature keywords in the emotional feature keyword library that reflect the same emotional features to the same emotional features. Emotional feature keyword libraries can be stored in files, databases, or caches, and retrieved from files, databases, or caches when needed. After identifying the extracted voice data as text, the terminal compares the characters included in the recognized text with the emotion feature keywords in the emotional feature keyword library. When the character in the text matches the emotional feature keyword in the emotional feature keyword library, the matched emotional feature keyword is obtained, and the emotional feature corresponding to the emotional feature keyword is obtained as the speech emotional feature recognition result.
举例说明,假设终端识别语音数据得到的文本为“我今天很开心”,其中包括情感特征关键字“开心”,“开心”映射至的情感特征为“开心”,那么语音情感特征识别结果“开心”。假设终端识别语音数据得到的文本为“我非常高兴”,其中包括情感特征关键字“高兴”,“高兴”映射至的情感特征为“开心”,那么语音情感特征识别结果也为“开心”。For example, suppose that the text obtained by the terminal to recognize the voice data is “I am very happy today”, including the emotional feature keyword “happy”, and the emotional feature mapped to “happy” is “happy”, then the voice emotional feature recognition result is “happy”. ". Assume that the text obtained by the terminal recognizing the voice data is "I am very happy", including the emotional feature keyword "happy", and the emotional feature mapped to "happy" is "happy", then the speech emotional feature recognition result is also "happy".
上述实施例中,通过对录制的语音数据进行文本识别,根据文本中包括的表示情感特征的字符来得到语音情感特征识别结果,提高了语音情感特征识别结果的准确性。In the above embodiment, by performing text recognition on the recorded speech data, the speech emotion feature recognition result is obtained according to the character representing the emotional feature included in the text, and the accuracy of the speech emotion feature recognition result is improved.
在一个实施例中,终端还可以根据语音数据对应的声学特征得到的语音情感特征识别结果。终端具体可对语音数据进行声学特征提取,根据事先建立的声学特征与情感特征的对应关系,获取相应的情感特征,得到语音情感特征识别结果。In one embodiment, the terminal may also recognize the result of the speech emotion feature obtained from the acoustic features corresponding to the voice data. The terminal specifically extracts the acoustic features of the speech data, and obtains the corresponding emotional features according to the correspondence between the acoustic features and the emotional features established in advance, and obtains the speech emotional feature recognition result.
在一个实施例中,声学特征包括音色与韵律特征。音色是指发声体发出声音的特色,不同的发声体由于材料、结构不同,发出声音的音色也就不同。在物理学上通过频谱参数来表征音色。韵律特征是指发声体发出声音的基础音调与节奏,在物理学上通过基频参数、时长分布以及信号强度来表征韵律特征。In one embodiment, the acoustic features include timbre and prosodic features. The timbre refers to the sound of the sounding body. Different sounding bodies have different sounds due to different materials and structures. The tone is characterized physics by spectral parameters. The prosodic feature refers to the basic pitch and rhythm of the sound emitted by the sound body, and the rhythm feature is characterized by the fundamental frequency parameter, the duration distribution and the signal intensity.
举例说明,人们在感觉到开心的时候,说话时韵律会表现为欢快,若终端在语音数据提取出的韵律特征中基础音调较高且节奏较快时,可表示该语音数据反映的情感特征为开心。For example, when people feel happy, the rhythm will be cheerful when speaking. If the base has higher pitch and faster rhythm in the prosody features extracted from the voice data, it can indicate that the emotional characteristics reflected by the voice data are Happy.
在本实施例中,通过对录制的语音数据进行声学特征提取,根据声学特征中表示情感特征的参数来得到语音情感特征识别结果,提高了语音情感特征识别结果的准确性。In this embodiment, by performing acoustic feature extraction on the recorded speech data, the speech emotion feature recognition result is obtained according to the parameter representing the emotional feature in the acoustic feature, and the accuracy of the speech emotion feature recognition result is improved.
在一个实施例中,图像处理方法中根据人脸情感特征识别结果和语音情感特征识别结果,查找相应的情感特征图像的步骤可具体包括:当人脸情感特征识别结果与语音情感特征识别结果匹配时,按照人脸情感特征识别结果查找相应的情感特征图像。In an embodiment, the step of searching for the corresponding emotion feature image according to the facial emotion feature recognition result and the voice emotion feature recognition result in the image processing method may specifically include: when the face emotion feature recognition result matches the voice emotion feature recognition result At the time, the corresponding emotional feature image is searched according to the facial emotion feature recognition result.
具体地,终端在获取根据图像帧中包括的人脸图像的表情识别得到的人脸情感特征识别结果,以及根据采集图像帧时录制的语音数据识别得到的语音情感特征识别结果后,将人脸情感特征识别结果与语音情感特征识别结果进行对比,当人脸情感特征识别结果与语音情感特征识别结果匹配时,按照人脸情感特征识别结果查找相应的情感特征图像。Specifically, the terminal acquires the face emotion feature recognition result obtained according to the face expression of the face image included in the image frame, and the voice emotion feature recognition result obtained according to the voice data recorded when the image frame is collected, and then the face is The emotion feature recognition result is compared with the speech emotion feature recognition result. When the face emotion feature recognition result matches the phonetic emotion feature recognition result, the corresponding emotion feature image is searched according to the face emotion feature recognition result.
在一个实施例中,图像处理方法中按照人脸情感特征识别结果查找相应的情感特征图像,包括:提取人脸情感特征识别结果包括的情感特征类型和识别结果置信度;查找与情感特征类型对应的情感特征图像集合;从情感特征图像集合中,挑选出与识别结果置信度相对应的情感特征图像。In an embodiment, the image processing method searches for the corresponding emotional feature image according to the facial emotion feature recognition result, including: extracting the emotional feature type and the recognition result confidence included in the facial emotional feature recognition result; and searching and corresponding to the emotional feature type a set of emotional feature images; from the set of emotional feature images, an emotional feature image corresponding to the confidence of the recognition result is selected.
其中,情感特征类型是指人脸所反映情感特征的类型。比如“开心”、“伤心”或者“愤怒”等。识别结果置信度表示人脸情感特征识别结果是人脸真实的情感特征的可信程度,识别结果置信度越高,表示人脸情感特征识别结果是人脸真实的情感特征的可能性越高。Among them, the emotional feature type refers to the type of emotional features reflected by the face. Such as "happy", "sad" or "anger". The confidence of the recognition result indicates that the facial emotion feature recognition result is the credibility of the real emotional feature of the face, and the higher the confidence of the recognition result, the higher the possibility that the face emotion feature recognition result is the real emotional feature of the face.
具体地,终端事先建立的情感特征图像库可以包括多个情感特征图像集合,每个情感特征图像集合反映一种情感特征类型。终端可对应于人脸情感特征识别结果置信度一一映射一张情感特征图像。终端在获取到人脸情感特征识别结果后,查找情感特征图像库中反映的情感特征与人脸情感特征识别结果包括的情感特征类型一致的情感特征图像集合,从查找到的情感特征图像集合中选取与人脸情感特征识别结果包括的识别结果置信度相对应的情感特征图像。Specifically, the emotional feature image library established in advance by the terminal may include a plurality of emotional feature image sets, and each of the emotional feature image sets reflects an emotional feature type. The terminal may map one emotional feature image one by one according to the confidence level of the facial emotion feature recognition result. After obtaining the facial emotion feature recognition result, the terminal searches for the emotional feature image set corresponding to the emotional feature type included in the emotional feature image database and the facial feature feature recognition result, from the found emotional feature image set. An emotion feature image corresponding to the confidence of the recognition result included in the face emotion feature recognition result is selected.
上述实施例中,对不同的人脸情感特征识别结果包括的识别结果置信度分别设置相对应的情感特征图像,通过情感特征图像来可视化反映人脸情感特征识别结果的可信度,使得图像处理结果更准确。In the above embodiment, the corresponding emotion feature images are respectively set for the confidence of the recognition results included in the different facial emotion feature recognition results, and the credibility of the face emotion feature recognition results is visually reflected by the emotion feature images, so that the image processing is performed. The result is more accurate.
在一个实施例中,当人脸情感特征识别结果与语音情感特征识别结果匹配时,终端也可在查找到的情感特征图像库中反映的情感特征与人脸情感特征识别结果包括的情感特征类型一致的情感特征图像集合中,随机选取一张情感特征图像。In an embodiment, when the facial emotion feature recognition result matches the speech emotion feature recognition result, the terminal may also reflect the emotional feature and the facial emotional feature recognition result included in the found emotional feature image database. In the consistent emotional feature image set, an emotional feature image is randomly selected.
在本实施例中,在人脸情感特征识别结果与语音情感特征识别结果匹配时,按照人脸情感特征识别结果查找相应的情感特征图像,这样在语音情感特征识别结果的保障下,按照人脸情感特征识别结果来进行图像处理,使得图像处理结果更准确。In the embodiment, when the facial emotion feature recognition result matches the voice emotion feature recognition result, the corresponding emotion feature image is searched according to the face emotion feature recognition result, so that the face is protected according to the voice emotion feature recognition result. The emotion feature recognition result is used for image processing, so that the image processing result is more accurate.
在一个实施例中,图像处理方法中根据人脸情感特征识别结果和语音情 感特征识别结果,查找相应的情感特征图像的步骤可具体包括:当人脸情感特征识别结果与语音情感特征识别结果不匹配时,按照语音情感特征识别结果查找相应的情感特征图像。In an embodiment, the step of searching for the corresponding emotion feature image according to the face emotion feature recognition result and the voice emotion feature recognition result in the image processing method may specifically include: when the face emotion feature recognition result and the voice emotion feature recognition result are not When matching, the corresponding emotional feature image is searched according to the speech emotion feature recognition result.
具体地,终端在获取根据图像帧中包括的人脸图像的表情识别得到的人脸情感特征识别结果,以及根据采集图像帧时录制的语音数据识别得到的语音情感特征识别结果后,将人脸情感特征识别结果与语音情感特征识别结果进行对比,当人脸情感特征识别结果与语音情感特征识别结果不匹配时,按照语音情感特征识别结果查找相应的情感特征图像。Specifically, the terminal acquires the face emotion feature recognition result obtained according to the face expression of the face image included in the image frame, and the voice emotion feature recognition result obtained according to the voice data recorded when the image frame is collected, and then the face is The emotional feature recognition result is compared with the speech emotion feature recognition result. When the facial emotion feature recognition result does not match the speech emotion feature recognition result, the corresponding emotional feature image is searched according to the speech emotion feature recognition result.
在一个实施例中,终端还可以获取语音数据识别得到的文本中包括的程度副词。程度副词用于表示情感的强烈程度,比如:“很”、“非常”或者“及其”等。终端对语音数据识别得到的语音情感特征识别结果具体可包括情感特征类型和情感强烈程度。In one embodiment, the terminal may also acquire an adverb of degree included in the text identified by the voice data. Degree adverbs are used to indicate the intensity of emotions, such as "very", "very" or "and". The speech emotion feature recognition result obtained by the terminal for the voice data identification may specifically include the emotion feature type and the emotion intensity degree.
具体地,终端事先建立的情感特征图像库可以包括多个情感特征图像集合,每个情感特征图像集合反映一种情感特征类型。终端可对应于情感强烈程度一一映射一张情感特征图像。终端在获取到语音情感特征识别结果后,查找情感特征图像库中反映的情感特征与语音情感特征识别结果包括的情感特征类型一致的情感特征图像集合,从查找到的情感特征图像集合中选取与语音情感特征识别结果包括的情感强烈程度相对应的情感特征图像。Specifically, the emotional feature image library established in advance by the terminal may include a plurality of emotional feature image sets, and each of the emotional feature image sets reflects an emotional feature type. The terminal may map an emotional feature image one by one corresponding to the intensity of the emotion. After obtaining the speech emotion feature recognition result, the terminal searches for the emotional feature image set that is reflected in the emotional feature image database and the emotional feature type included in the speech emotional feature recognition result, and selects from the found emotional feature image set. The speech emotion feature recognition result includes an emotional feature image corresponding to the emotional intensity.
在本实施例中,在人脸情感特征识别结果与语音情感特征识别结果不匹配时,按照语音情感特征识别结果查找相应的情感特征图像,这种以真实的语音数据表达的情感特征识别结果来进行图像处理,使得图像处理结果更准确。In this embodiment, when the facial emotion feature recognition result does not match the voice emotion feature recognition result, the corresponding emotion feature image is searched according to the voice emotion feature recognition result, and the emotion feature recognition result expressed by the real voice data is Image processing is performed to make the image processing result more accurate.
上述实施例中,综合考虑人脸情感特征识别结果与语音情感特征识别结果,查找反映图像帧中所以表现的情感特征的情感特征图像,使得图像处理结果更准确。In the above embodiment, the facial emotion feature recognition result and the speech emotion feature recognition result are comprehensively considered, and the emotional feature image reflecting the emotional feature expressed in the image frame is searched for, so that the image processing result is more accurate.
在一个实施例中,步骤S1206具体包括确定人脸图像在当前播放的图像帧中的显示位置;查询情感特征图像与人脸图像的相对位置;根据显示位置 和相对位置,确定情感特征图像在当前播放的图像帧中的展示位置。In an embodiment, step S1206 specifically includes determining a display position of the face image in the currently played image frame; querying a relative position of the emotion feature image and the face image; and determining the emotion feature image according to the display position and the relative position The placement in the image frame that is played.
在本实施例中,情感特征图像在当前播放的图像帧中的展示位置是指情感特征图像在当前播放的图像帧中进行展示的物理位置。终端可在查找情感特征图像时,获取查找到的情感特征图像绘制时的参照物。参照物具体可以是图像帧中包括的人脸图像。In this embodiment, the presentation position of the emotional feature image in the currently played image frame refers to the physical location where the emotional feature image is displayed in the currently played image frame. The terminal may obtain a reference object when the found emotional feature image is drawn when searching for the emotional feature image. The reference object may specifically be a face image included in the image frame.
具体地,终端可获取参照物在当前播放的图像帧中的显示位置和情感特征图像与参照物的相对位置,终端再根据参照物在当前播放的图像帧中的显示位置和情感特征图像与参照物的相对位置确定情感特征图像在当前播放的图像帧中的展示位置。情感特征图像在当前播放的图像帧中的展示位置具体可以是像素坐标区间或者其他预设定位方式的坐标区间。像素是指计算机屏幕上所能显示的最小单位。在本实施例中,像素可以是逻辑像素或者物理像素。Specifically, the terminal can obtain the display position of the reference object in the currently played image frame and the relative position of the emotional feature image and the reference object, and the terminal displays the position and the emotional feature image and the reference according to the reference object in the currently played image frame. The relative position of the object determines the position of the emotional feature image in the currently played image frame. The display position of the emotion feature image in the currently played image frame may specifically be a pixel coordinate interval or a coordinate interval of other preset positioning modes. A pixel is the smallest unit that can be displayed on a computer screen. In this embodiment, the pixels may be logical pixels or physical pixels.
上述实施例中,通过设置情感特征图像与人脸图像的相对位置,使情感特征图像相对于人脸图像的位置进行显示,从而使得情感特征图像的显示位置更加合理。In the above embodiment, by setting the relative positions of the emotion feature image and the face image, the position of the emotion feature image relative to the face image is displayed, thereby making the display position of the emotion feature image more reasonable.
在一个实施例中,该图像处理方法还包括在播放的图像帧中,追踪人脸图像的运动轨迹;根据追踪的运动轨迹,将情感特征图像跟随播放的图像帧中的人脸图像移动。这些步骤具体可在S1208之后执行。In one embodiment, the image processing method further includes tracking a motion trajectory of the face image in the played image frame; and moving the emotion feature image to follow the face image in the played image frame according to the tracked motion trajectory. These steps can be specifically performed after S1208.
其中,人脸图像的运动轨迹是指连续播放的图像帧中包括的人脸图像形成的轨迹。具体地,情感特征图像的展示位置可以是情感特征图像相对于当前播放的图像帧中人脸图像的位置;终端可在播放的图像帧中追踪当前播放的图像帧中的该人脸图像,从而按照该展示位置和追踪到的人脸图像,确定当前播放的图像帧中情感特征图像相对于追踪到的人脸图像的位置,从而按照确定的位置渲染情感特征图像。The motion track of the face image refers to a track formed by the face image included in the continuously played image frame. Specifically, the display position of the emotion feature image may be a position of the emotion feature image relative to the face image in the currently played image frame; the terminal may track the face image in the currently played image frame in the played image frame, thereby According to the display position and the tracked face image, the position of the emotional feature image in the currently played image frame relative to the tracked face image is determined, thereby rendering the emotional feature image according to the determined position.
上述实施例中,情感特征图像跟随人脸图像进行显示,从而智能地将情感特征图像与现实场景中的人脸相联系,提供新的互动方式。In the above embodiment, the emotion feature image is displayed following the face image, thereby intelligently connecting the emotion feature image with the face in the real scene to provide a new interaction mode.
如图13所示,在一个具体的实施例中,用户生成内容处理方法包括:As shown in FIG. 13, in a specific embodiment, the user generated content processing method includes:
S1302,从现实世界采集图像帧。S1302, acquiring image frames from the real world.
S1303,将采集的图像帧按照采集的时序逐帧播放。S1303: The collected image frames are played frame by frame according to the collected timing.
S1304,从采集的图像帧中选取图像帧。S1304: Select an image frame from the collected image frames.
S1305,判断选取的图像帧中是否包括人脸图像;若是,则跳转到步骤S1306;若否,则跳转到步骤S1314。S1305: Determine whether a face image is included in the selected image frame; if yes, go to step S1306; if no, go to step S1314.
S1306,调整图像帧的尺寸至预设尺寸;将调整后的图像帧的方向旋转至符合情感特征识别条件的方向;发送旋转后的图像帧至服务器;接收服务器返回的人脸情感特征识别结果。S1306: Adjust the size of the image frame to a preset size; rotate the direction of the adjusted image frame to a direction that conforms to the emotional feature recognition condition; send the rotated image frame to the server; and receive the facial emotion feature recognition result returned by the server.
S1307,提取采集图像帧时录制的语音数据;获取识别语音数据得到的语音情感特征识别结果。S1307: extract voice data recorded when the image frame is acquired; and obtain a voice emotion feature recognition result obtained by identifying the voice data.
S1308,判断人脸情感特征识别结果与语音情感特征识别结果是否匹配;若是,则跳转到步骤S1309;若否,则跳转到步骤S1310。S1308: determining whether the facial emotion feature recognition result matches the voice emotion feature recognition result; if yes, the process goes to step S1309; if not, the process goes to step S1310.
S1309,提取人脸情感特征识别结果包括的情感特征类型和识别结果置信度;查找与情感特征类型对应的情感特征图像集合;从情感特征图像集合中,挑选出与识别结果置信度相对应的情感特征图像。S1309, extracting the emotion feature type and the recognition result confidence degree included in the facial emotion feature recognition result; searching for the emotion feature image set corresponding to the emotion feature type; and extracting the emotion corresponding to the recognition result confidence degree from the emotion feature image set Feature image.
S1310,按照语音情感特征识别结果查找相应的情感特征图像。S1310: Search for a corresponding emotional feature image according to the speech emotion feature recognition result.
S1311,确定人脸图像在当前播放的图像帧中的显示位置;查询情感特征图像与人脸图像的相对位置;根据显示位置和相对位置,确定情感特征图像在当前播放的图像帧中的展示位置。S1311: determining a display position of the face image in the currently played image frame; querying a relative position of the emotional feature image and the face image; and determining, according to the display position and the relative position, the display position of the emotional feature image in the currently played image frame .
S1312,按照展示位置,在当前播放的图像帧中渲染情感特征图像。S1312: Render an emotional feature image in the currently played image frame according to the display position.
S1313,在播放的图像帧中,追踪人脸图像的运动轨迹;根据追踪的运动轨迹,将情感特征图像跟随播放的图像帧中的人脸图像移动。S1313: Tracking a motion trajectory of the face image in the played image frame; and moving the emotion feature image to follow the face image in the played image frame according to the tracked motion trajectory.
S1314,获取与选取的图像帧匹配的模板图像所关联的用户生成内容。S1314: Acquire user generated content associated with the template image that matches the selected image frame.
S1315,获取用户生成内容在匹配的模板图像中的展示位置。S1315: Acquire a display position of the user-generated content in the matched template image.
S1316,按照展示位置,在播放的图像帧中渲染用户生成内容。S1316: Render user generated content in the played image frame according to the placement position.
在本实施例中,从现实场景中采集图像帧并按照采集的时序播放,当图像帧中包括人脸图像时,通过对采集的图像帧中包括的人脸图像的人脸情感 特征识别结果,就能够确定反映人脸图像中人物情感特征的情感特征图像,并进行展示。这样直接根据现实场景中采集的图像帧即时进行情感特征图像的展示,可避免人工手动选取情感特征图像以及手动调整情感特征图像进行展示而引入的工作量,提高了图像处理效率,且图像处理实时性强。In this embodiment, the image frame is acquired from the real scene and played according to the collected timing. When the image frame includes the face image, the face emotion feature recognition result of the face image included in the collected image frame is It is possible to determine and display the emotional feature image reflecting the emotional characteristics of the person in the face image. In this way, the emotional feature image is displayed immediately according to the image frame collected in the real scene, which avoids the manual introduction of the emotional feature image and manually adjusts the emotional feature image for display, thereby improving the image processing efficiency and real-time image processing. Strong.
而且,在图像帧中不包括人脸图像时,确定该图像帧所匹配的模板图像所关联的用户生成内容,并进行展示。能够通过现实世界中拍摄的图像帧定位到用户生成内容并展示,可以不必依赖社交关系,扩展了用户生成内容的传播方式。而且,按照用户生成内容在匹配的模板图像中的展示位置,在播放的图像帧中追踪渲染用户生成内容,将虚拟世界中的用户生成内容与播放的视频帧所反映的现实世界融合,提供了用户生成内容的新互动方式。Moreover, when the face image is not included in the image frame, the user generated content associated with the template image matched by the image frame is determined and displayed. The ability to locate and display user-generated content through image frames captured in the real world can extend the way user-generated content is propagated without having to rely on social relationships. Moreover, according to the display position of the user-generated content in the matched template image, the user-generated content is tracked and rendered in the played image frame, and the user-generated content in the virtual world is merged with the real world reflected by the played video frame, and A new way of interacting with user-generated content.
在一个实施例中,终端在对语音数据识别得到文本后,还可在当前播放的图像帧中显示识别得到的文本。终端具体可在当前播放的图像帧中绘制用于进行显示文本内容的组件,在组件中显示识别得到文本。在本实施例中,通过在当前播放的图像帧中显示识别得到的文本可克服聋哑人间交互的障碍,提高了图像处理的实用性。In an embodiment, after the terminal recognizes the voice data, the terminal may also display the recognized text in the currently played image frame. The terminal specifically draws a component for displaying the text content in the currently played image frame, and displays the recognized text in the component. In this embodiment, by displaying the recognized text in the currently played image frame, the obstacle of the interaction between the deaf and the human being can be overcome, and the practicability of the image processing is improved.
应该理解的是,虽然本申请各实施例中的各个步骤并不是必然按照步骤标号指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,各实施例中至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that the various steps in the various embodiments of the present application are not necessarily performed in the order indicated by the steps. Except as explicitly stated herein, the execution of these steps is not strictly limited, and the steps may be performed in other orders. Moreover, at least some of the steps in the embodiments may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be executed at different times, and the execution of these sub-steps or stages The order is also not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of the other steps.
图14示出了一个实施例中绘制情感特征图像前后界面的对比示意图。参考图14左为绘制情感特征图像前的界面示意图,该界面示意图包括人脸图像1410,参考图14右绘制情感特征图像后的界面示意图,该界面示意图包括人脸图像1410和情感特征图像1420,情感特征图像1420包括表示情感特征为开心的情感特征图像1421和表示情感特征为伤心的情感特征图像1422。Figure 14 is a diagram showing a comparison of the front and rear interfaces of an emotional feature image in one embodiment. FIG. 14 is a schematic diagram of an interface before the emotional feature image is drawn. The interface diagram includes a face image 1410, and an interface diagram of the emotional feature image is drawn right after referring to FIG. 14 . The interface schematic includes a face image 1410 and an emotional feature image 1420. The emotion feature image 1420 includes an emotion feature image 1421 indicating that the emotion feature is happy and an emotion feature image 1422 indicating that the emotion feature is sad.
终端在根据对绘制情感特征图像前的界面中的人脸图像1410进行表情识别得到的人脸情感特征识别结果,以及对录制的语音数据识别得到的语音情感特征识别结果查找到相应的情感特征图像后。若终端确定的图14左包括人脸图像1410反映的情感特征为开心,则在当前播放的图像帧中追踪该人脸图像1410,并在相应的位置绘制表示情感特征为开心的情感特征图像1421。若终端确定的图14左包括人脸图像1410反映的情感特征为伤心,则在当前播放的图像帧中追踪该人脸图像1410,并在相应的位置绘制表示情感特征为伤心的情感特征图像1422。The terminal obtains the corresponding facial feature image by the facial emotion feature recognition result obtained by performing facial expression recognition on the face image 1410 in the interface before the emotional feature image is drawn, and the speech emotion feature recognition result obtained by the recorded voice data recognition. Rear. If the emotional feature reflected by the left image of FIG. 14 including the face image 1410 is happy, the face image 1410 is tracked in the currently played image frame, and the emotional feature image 1421 indicating that the emotional feature is happy is drawn at the corresponding position. . If the emotion feature reflected by the left image of FIG. 14 including the face image 1410 is sad, the face image 1410 is tracked in the currently played image frame, and the emotional feature image 1422 indicating that the emotional feature is sad is drawn at the corresponding position. .
图15示出了一个实施例中显示根据语音数据识别得到的文本前后界面的对比示意图。参考图15左为显示根据语音数据识别得到的文本前的界面示意图,该界面示意图包括人脸图像1510,参考图15右为显示根据语音数据识别得到的文本后的界面示意图,该界面示意图包括人脸图像1510、情感特征图像1520和文本1530。其中,文本1530由终端根据采集该图像帧时录制的语音数据识别得到,具体可以是“我今天好难过”,反映的情感特征为伤心,则可在当前播放的图像帧中追踪该人脸图像1510,并在相应的位置显示识别得到的文本1530,还可在相应的位置绘制表示情感特征为伤心的情感特征图像1520。Fig. 15 is a view showing a comparison of the front and rear interfaces of the text obtained by the recognition of the voice data in one embodiment. Referring to FIG. 15 is a schematic diagram showing an interface before the text is obtained according to the voice data. The interface diagram includes a face image 1510. Referring to FIG. 15 is a schematic diagram of an interface after displaying the text according to the voice data. The interface diagram includes a person. Face image 1510, emotion feature image 1520, and text 1530. The text 1530 is recognized by the terminal according to the voice data recorded when the image frame is collected, and may be “I am so sad today”, and the emotional feature reflected is sad, and the face image may be tracked in the currently played image frame. 1510, and the recognized text 1530 is displayed at the corresponding position, and the emotional feature image 1520 indicating that the emotional feature is sad may also be drawn at the corresponding position.
图16为一个实施例中终端1600的结构框图。该终端1600的内部结构可参照如图2所示的结构。下述的每个模块可全部或部分通过软件、硬件或其组合来实现。参照图16,该终端1600包括:采集模块1601、播放模块1602、选取模块1603、数据获取模块1604和渲染模块1605。Figure 16 is a block diagram showing the structure of the terminal 1600 in one embodiment. The internal structure of the terminal 1600 can be referred to the structure shown in FIG. 2. Each of the modules described below can be implemented in whole or in part by software, hardware, or a combination thereof. Referring to FIG. 16, the terminal 1600 includes an acquisition module 1601, a play module 1602, a selection module 1603, a data acquisition module 1604, and a rendering module 1605.
采集模块1601,用于从现实世界采集图像帧。The acquisition module 1601 is configured to collect image frames from the real world.
播放模块1602,用于将采集的图像帧按照采集的时序逐帧播放。The playing module 1602 is configured to play the collected image frames frame by frame according to the collected timing.
选取模块1603,用于从采集的图像帧中选取图像帧。A selection module 1603 is configured to select an image frame from the acquired image frames.
数据获取模块1604,用于获取与选取的图像帧匹配的模板图像所关联的用户生成内容;获取用户生成内容在匹配的模板图像中的展示位置。The data obtaining module 1604 is configured to acquire user generated content associated with the template image that matches the selected image frame, and obtain a display position of the user generated content in the matched template image.
渲染模块1605,用于按照展示位置,在播放的图像帧中渲染用户生成内 容。The rendering module 1605 is configured to render user generated content in the played image frame according to the placement position.
在一个实施例中,选取模块1603还用于判断选取的图像帧的特征是否符合预设的模板图像特征;当选取的图像帧的特征符合模板图像特征时,通知获取模块,使获取模块工作;当选取的图像帧的特征不符合模板图像特征时,继续从采集的图像帧中选取图像帧。In an embodiment, the selecting module 1603 is further configured to determine whether the feature of the selected image frame meets the preset template image feature; and when the feature of the selected image frame meets the template image feature, notifying the acquiring module to enable the acquiring module to work; When the features of the selected image frame do not conform to the template image features, the image frames are continuously selected from the acquired image frames.
在一个实施例中,数据获取模块1604还用于将选取的图像帧上传至服务器;接收服务器反馈的表示查询到与上传的图像帧匹配的模板图像的第一通知;根据第一通知,获取与模板图像关联的用户生成内容。In an embodiment, the data obtaining module 1604 is further configured to: upload the selected image frame to the server; receive, by the server, a first notification that is sent to the template image that matches the uploaded image frame; and obtain the User-generated content associated with the template image.
在一个实施例中,数据获取模块1604还用于将选取的图像帧上传至服务器;接收服务器反馈的表示未查询到与上传的图像帧匹配的模板图像的第二通知;根据第二通知展示内容创建入口;根据对内容创建入口的操作创建用户生成内容;将创建的用户生成内容上传至服务器,使得服务器将上传的用户生成内容与由上传的图像帧注册而成的模板图像关联存储。In an embodiment, the data obtaining module 1604 is further configured to: upload the selected image frame to the server; receive, by the server, a second notification that is not queried to the template image that matches the uploaded image frame; and display the content according to the second notification Creating an entry; creating user generated content according to an operation of creating an entry for the content; uploading the created user generated content to the server, so that the server stores the uploaded user generated content in association with the template image registered by the uploaded image frame.
在一个实施例中,数据获取模块1604还用于获取在创建用户生成内容时配置的立体旋转参数。渲染模块1605还用于按照展示位置,在播放的图像帧中渲染按照立体旋转参数旋转后的用户生成内容。In one embodiment, the data acquisition module 1604 is further configured to acquire a stereo rotation parameter configured when the user generated content is created. The rendering module 1605 is further configured to render the user-generated content rotated according to the stereo rotation parameter in the played image frame according to the display position.
在一个实施例中,渲染模块1605还用于在播放的图像帧中追踪模板图像中的物体区域;检测追踪到的物体区域相对于模板图像中的物体区域的形态变化;根据形态变化确定表示观察方向的参数;根据展示位置,在播放的图像帧中渲染按照表示观察方向的参数变形后的用户生成内容。In an embodiment, the rendering module 1605 is further configured to track an object region in the template image in the played image frame; detect a morphological change of the tracked object region relative to the object region in the template image; determine the observation according to the morphological change The parameter of the direction; according to the display position, the user generated content deformed according to the parameter indicating the viewing direction is rendered in the played image frame.
在一个实施例中,数据获取模块1604还用于获取与选取的图像帧匹配的模板图像所关联的多个内容创建者信息及相对应的用户生成内容。渲染模块1605还用于展示多个内容创建者信息;选中多个内容创建者信息中的一个;按照选中的一个内容创建者信息相对应的展示位置,在播放的图像帧中渲染相对应的用户生成内容。In one embodiment, the data acquisition module 1604 is further configured to acquire a plurality of content creator information and corresponding user generated content associated with the template image that matches the selected image frame. The rendering module 1605 is further configured to display a plurality of content creator information; select one of the plurality of content creator information; render the corresponding user in the played image frame according to the selected one of the content creator information corresponding to the displayed position Generate content.
上述终端1600,从现实世界采集图像帧并按照采集的时序播放,通过从采集的图像帧中选取的图像帧,就能够确定该图像帧所匹配的模板图像所关 联的用户生成内容,并进行展示。能够通过现实世界中拍摄的图像帧定位到用户生成内容并展示,可以不必依赖社交关系,扩展了用户生成内容的传播方式。而且,按照用户生成内容在匹配的模板图像中的展示位置,在播放的图像帧中追踪渲染用户生成内容,将虚拟世界中的用户生成内容与播放的视频帧所反映的现实世界融合,提供了用户生成内容的新互动方式。The terminal 1600 collects image frames from the real world and plays according to the collected timing. By selecting the image frames from the collected image frames, the user-generated content associated with the template image matched by the image frames can be determined and displayed. . The ability to locate and display user-generated content through image frames captured in the real world can extend the way user-generated content is propagated without having to rely on social relationships. Moreover, according to the display position of the user-generated content in the matched template image, the user-generated content is tracked and rendered in the played image frame, and the user-generated content in the virtual world is merged with the real world reflected by the played video frame, and A new way of interacting with user-generated content.
如图17所示,在一个实施例中,终端1600还包括:识别结果获取模块1703、查找模块1704和展示位置获取模块1705。As shown in FIG. 17, in an embodiment, the terminal 1600 further includes: a recognition result obtaining module 1703, a searching module 1704, and a placement acquiring module 1705.
识别结果获取模块1703,用于在选取的图像帧中包括人脸图像时,获取识别图像帧中包括的人脸图像得到的人脸情感特征识别结果。The recognition result obtaining module 1703 is configured to obtain a facial emotion feature recognition result obtained by identifying a face image included in the image frame when the face image is included in the selected image frame.
查找模块1704,用于根据人脸情感特征识别结果,查找相应的情感特征图像。The searching module 1704 is configured to search for a corresponding emotional feature image according to the facial emotion feature recognition result.
展示位置获取模块1705,用于获取情感特征图像在当前播放的图像帧中的展示位置。The display location obtaining module 1705 is configured to acquire a display position of the emotional feature image in the currently played image frame.
渲染模块1605还用于按照展示位置,在当前播放的图像帧中渲染情感特征图像。The rendering module 1605 is further configured to render the emotional feature image in the currently played image frame according to the placement position.
上述终端1600,将反映现实场景的图像帧播放,使得播放的图像帧能够反映现实场景。获取识别图像帧中包括的人脸图像得到的人脸情感特征识别结果,就可以自动地确定现实场景中的人物的情感状况。获取到情感特征图像在当前播放的图像帧中的展示位置后,按照该展示位置,在当前播放的图像帧中渲染情感特征图像,就可以自动地将虚拟的情感特征图像与现实场景中人物相结合,反映现实场景中人物的情感状况。因避免了人工操作的繁琐步骤,极大地提高了图像处理效率。The terminal 1600 plays the image frame reflecting the real scene, so that the played image frame can reflect the real scene. Obtaining the facial emotion feature recognition result obtained by identifying the face image included in the image frame can automatically determine the emotional state of the person in the real scene. After obtaining the display position of the emotional feature image in the currently played image frame, according to the display position, rendering the emotional feature image in the currently played image frame, the virtual emotional feature image can be automatically compared with the character in the real scene. Combine to reflect the emotional state of the characters in the real scene. The image processing efficiency is greatly improved by avoiding the cumbersome steps of manual operation.
在一个实施例中,识别结果获取模块1703还用于调整图像帧的尺寸至预设尺寸;将调整后的图像帧的方向旋转至符合情感特征识别条件的方向;发送旋转后的图像帧至服务器;接收服务器返回的针对发送的图像帧的人脸情感特征识别结果。In one embodiment, the recognition result obtaining module 1703 is further configured to adjust the size of the image frame to a preset size; rotate the direction of the adjusted image frame to a direction that conforms to the emotional feature recognition condition; and send the rotated image frame to the server. Receiving a face emotion feature recognition result returned by the server for the transmitted image frame.
在本实施例中,在通过服务器对图像帧中人脸图像进行表情识别前,调 整图像帧的尺寸和方向,使得图像帧符合进行表情识别的条件,可提高表情识别速度与准确性,还可减少硬件资源消耗。In this embodiment, before the expression recognition of the face image in the image frame by the server, the size and direction of the image frame are adjusted, so that the image frame conforms to the condition for performing the expression recognition, and the speed and accuracy of the expression recognition can be improved. Reduce hardware resource consumption.
在一个实施例中,识别结果获取模块1703还用于提取采集图像帧时录制的语音数据;获取识别语音数据得到的语音情感特征识别结果。查找模块1704还用于根据人脸情感特征识别结果和语音情感特征识别结果,查找相应的情感特征图像。In one embodiment, the recognition result obtaining module 1703 is further configured to extract voice data recorded when the image frame is acquired, and obtain a voice emotion feature recognition result obtained by identifying the voice data. The searching module 1704 is further configured to search for a corresponding emotional feature image according to the facial emotion feature recognition result and the voice emotion feature recognition result.
在本实施例中,综合考虑人脸情感特征识别结果与语音情感特征识别结果,查找反映图像帧中所以表现的情感特征的情感特征图像,使得图像处理结果更准确。In this embodiment, the facial emotion feature recognition result and the speech emotion feature recognition result are comprehensively considered, and the emotion feature image reflecting the emotional feature expressed in the image frame is searched for, so that the image processing result is more accurate.
在一个实施例中,识别结果获取模块1703还用于将提取的语音数据识别为文本;查找文本所包括的情感特征关键字;根据查找到的情感特征关键字,获取与语音数据相对应的语音情感特征识别结果。In an embodiment, the recognition result obtaining module 1703 is further configured to identify the extracted voice data as text; search for an emotional feature keyword included in the text; and acquire a voice corresponding to the voice data according to the found emotional feature keyword Emotional feature recognition results.
在本实施例中,通过对录制的语音数据进行文本识别,根据文本中包括的表示情感特征的字符来得到语音情感特征识别结果,提高了语音情感特征识别结果的准确性。In this embodiment, by performing text recognition on the recorded voice data, the voice emotion feature recognition result is obtained according to the character representing the emotional feature included in the text, and the accuracy of the voice emotion feature recognition result is improved.
在一个实施例中,查找模块1704还用于当人脸情感特征识别结果与语音情感特征识别结果匹配时,按照人脸情感特征识别结果查找相应的情感特征图像。In an embodiment, the searching module 1704 is further configured to search for a corresponding emotional feature image according to the facial emotion feature recognition result when the facial emotion feature recognition result matches the voice emotion feature recognition result.
在本实施例中,对不同的人脸情感特征识别结果包括的识别结果置信度分别设置相对应的情感特征图像,通过情感特征图像来可视化反映人脸情感特征识别结果的可信度,使得图像处理结果更准确。In this embodiment, the corresponding emotion feature images are respectively set for the confidence of the recognition result included in the different facial emotion feature recognition results, and the credibility of the face emotion feature recognition result is visually reflected by the emotion feature image, so that the image is made The processing results are more accurate.
在一个实施例中,查找模块1704还用于提取人脸情感特征识别结果包括的情感特征类型和识别结果置信度;查找与情感特征类型对应的情感特征图像集合;从情感特征图像集合中,挑选出与识别结果置信度相对应的情感特征图像。In an embodiment, the searching module 1704 is further configured to extract the sentiment feature type and the recognition result confidence included in the facial emotion feature recognition result; search for the emotional feature image set corresponding to the emotional feature type; and select from the emotional feature image set An emotional feature image corresponding to the confidence of the recognition result.
在本实施例中,对不同的人脸情感特征识别结果包括的识别结果置信度分别设置相对应的情感特征图像,通过情感特征图像来可视化反映人脸情感 特征识别结果的可信度,使得图像处理结果更准确。In this embodiment, the corresponding emotion feature images are respectively set for the confidence of the recognition result included in the different facial emotion feature recognition results, and the credibility of the face emotion feature recognition result is visually reflected by the emotion feature image, so that the image is made The processing results are more accurate.
在一个实施例中,查找模块1704还用于当人脸情感特征识别结果与语音情感特征识别结果不匹配时,按照语音情感特征识别结果查找相应的情感特征图像。In an embodiment, the searching module 1704 is further configured to: when the facial emotion feature recognition result does not match the voice emotion feature recognition result, search for the corresponding emotional feature image according to the voice sentiment feature recognition result.
在本实施例中,在人脸情感特征识别结果与语音情感特征识别结果不匹配时,按照语音情感特征识别结果查找相应的情感特征图像,这种以真实的语音数据表达的情感特征识别结果来进行图像处理,使得图像处理结果更准确。In this embodiment, when the facial emotion feature recognition result does not match the voice emotion feature recognition result, the corresponding emotion feature image is searched according to the voice emotion feature recognition result, and the emotion feature recognition result expressed by the real voice data is Image processing is performed to make the image processing result more accurate.
在一个实施例中,展示位置获取模块1705还用于确定人脸图像在当前播放的图像帧中的显示位置;查询情感特征图像与人脸图像的相对位置;根据显示位置和相对位置,确定情感特征图像在当前播放的图像帧中的展示位置。In one embodiment, the location acquiring module 1705 is further configured to determine a display position of the face image in the currently played image frame; query a relative position of the emotion feature image and the face image; determine the emotion according to the display position and the relative position The placement of the feature image in the currently played image frame.
在本实施例中,通过设置情感特征图像与人脸图像的相对位置,使情感特征图像相对于人脸图像的位置进行显示,从而使得情感特征图像的显示位置更加合理。In the embodiment, by setting the relative positions of the emotion feature image and the face image, the position of the emotion feature image relative to the face image is displayed, thereby making the display position of the emotion feature image more reasonable.
如图18所示,在一个实施例中,终端1600还包括渲染跟随模块1707。As shown in FIG. 18, in one embodiment, terminal 1600 also includes a render following module 1707.
渲染跟随模块1707,用于在播放的图像帧中,追踪人脸图像的运动轨迹;根据追踪的运动轨迹,将情感特征图像跟随播放的图像帧中的人脸图像移动。The rendering follower module 1707 is configured to track the motion trajectory of the face image in the played image frame; and move the emotion feature image to follow the face image in the played image frame according to the tracked motion trajectory.
在本实施例中,情感特征图像跟随人脸图像进行显示,从而智能地将情感特征图像与现实场景中的人脸相联系,提供新的互动方式。In this embodiment, the emotional feature image is displayed following the face image, thereby intelligently associating the emotional feature image with the face in the real scene to provide a new interactive mode.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而 非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。One of ordinary skill in the art can understand that all or part of the process of implementing the above embodiments can be completed by a computer program to instruct related hardware, and the program can be stored in a non-volatile computer readable storage medium. Wherein, the program, when executed, may include the flow of an embodiment of the methods as described above. Any reference to a memory, storage, database or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of formats, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronization chain. Synchlink DRAM (SLDRAM), Memory Bus (Rambus) Direct RAM (RDRAM), Direct Memory Bus Dynamic RAM (DRDRAM), and Memory Bus Dynamic RAM (RDRAM).
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above embodiments may be arbitrarily combined. For the sake of brevity of description, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, It is considered to be the range described in this specification.
以上实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above embodiments are merely illustrative of several embodiments of the present application, and the description thereof is more specific and detailed, but is not to be construed as limiting the scope of the invention. It should be noted that a number of variations and modifications may be made by those skilled in the art without departing from the spirit and scope of the present application. Therefore, the scope of the invention should be determined by the appended claims.

Claims (31)

  1. 一种用户生成内容处理方法,包括:A user generated content processing method, including:
    终端从现实世界采集图像帧;The terminal collects image frames from the real world;
    所述终端将采集的图像帧按照采集的时序逐帧播放;The terminal plays the captured image frames frame by frame according to the collected timing;
    所述终端从采集的图像帧中选取图像帧;The terminal selects an image frame from the collected image frames;
    所述终端获取与选取的图像帧匹配的模板图像所关联的用户生成内容;The terminal acquires user generated content associated with a template image that matches the selected image frame;
    所述终端获取所述用户生成内容在所述匹配的模板图像中的展示位置;及Obtaining, by the terminal, a display position of the user generated content in the matched template image; and
    所述终端按照所述展示位置,在播放的图像帧中渲染所述用户生成内容。The terminal renders the user generated content in a played image frame according to the display position.
  2. 根据权利要求1所述的方法,其特征在于,还包括:The method of claim 1 further comprising:
    所述终端判断选取的图像帧的特征是否符合预设的模板图像特征;Determining, by the terminal, whether the feature of the selected image frame meets a preset template image feature;
    当选取的图像帧的特征符合所述模板图像特征时,所述终端执行获取与选取的图像帧匹配的模板图像所关联的用户生成内容的步骤;及When the feature of the selected image frame conforms to the template image feature, the terminal performs a step of acquiring user generated content associated with the template image that matches the selected image frame;
    当选取的图像帧的特征不符合所述模板图像特征时,返回至从采集的图像帧中选取图像帧的步骤。When the feature of the selected image frame does not conform to the template image feature, returning to the step of selecting an image frame from the acquired image frame.
  3. 根据权利要求2所述的方法,其特征在于,所述终端判断选取的图像帧的特征是否符合预设的模板图像特征,包括:The method according to claim 2, wherein the determining, by the terminal, whether the feature of the selected image frame conforms to the preset template image feature comprises:
    所述终端提取选取的图像帧的特征点,判断提取的特征点的数量是否达到预设的模板图像特征点数量阈值;和/或,The terminal extracts feature points of the selected image frame, and determines whether the number of extracted feature points reaches a preset template image feature point threshold; and/or,
    所述终端获取选取的图像帧的分辨率,判断所述分辨率是否达到预设的模板图像分辨率阈值;和/或,Obtaining, by the terminal, a resolution of the selected image frame, determining whether the resolution reaches a preset template image resolution threshold; and/or,
    所述终端获取选取的图像帧的清晰度,判断所述清晰度是否达到预设的模板图像清晰度阈值;和/或,The terminal acquires the definition of the selected image frame, and determines whether the resolution reaches a preset template image clarity threshold; and/or,
    所述终端获取选取的图像帧中的物体区域占所述选取的图像帧的占比,判断所述占比是否达到预设的模板图像物体占比。The terminal acquires the proportion of the object area in the selected image frame to the selected image frame, and determines whether the ratio reaches a preset template image object ratio.
  4. 根据权利要求1所述的方法,其特征在于,所述终端获取与选取的图像帧匹配的模板图像所关联的用户生成内容,包括:The method according to claim 1, wherein the terminal acquires user-generated content associated with the template image that matches the selected image frame, including:
    所述终端将选取的图像帧上传至服务器;The terminal uploads the selected image frame to the server;
    所述终端接收所述服务器反馈的表示查询到与上传的图像帧匹配的模板图像的第一通知;及Receiving, by the terminal, a first notification that is sent by the server to query a template image that matches an uploaded image frame; and
    所述终端根据所述第一通知,获取与所述模板图像关联的用户生成内容。The terminal acquires user generated content associated with the template image according to the first notification.
  5. 根据权利要求1所述的方法,其特征在于,所述获取与选取的图像帧匹配的模板图像所关联的用户生成内容,包括:The method according to claim 1, wherein the acquiring the user-generated content associated with the template image that matches the selected image frame comprises:
    所述终端将选取的图像帧上传至服务器;The terminal uploads the selected image frame to the server;
    所述终端接收所述服务器反馈的表示未查询到与上传的图像帧匹配的模板图像的第二通知;Receiving, by the terminal, a second notification that is sent by the server, indicating that the template image that matches the uploaded image frame is not queried;
    所述终端根据所述第二通知展示内容创建入口;The terminal creates an entry according to the second notification display content;
    所述终端根据对所述内容创建入口的操作创建用户生成内容;及The terminal creates user generated content according to an operation of creating an entry for the content; and
    所述终端将创建的用户生成内容上传至所述服务器,使得所述服务器将上传的所述用户生成内容与由上传的图像帧注册而成的模板图像关联存储。The terminal uploads the created user generated content to the server, so that the server associates the uploaded user generated content with a template image registered by the uploaded image frame.
  6. 根据权利要求1所述的方法,其特征在于,还包括:The method of claim 1 further comprising:
    所述终端获取在创建所述用户生成内容时配置的立体旋转参数;The terminal acquires a stereo rotation parameter configured when the user generated content is created;
    所述终端按照所述展示位置,在播放的图像帧中渲染所述用户生成内容,包括:And the rendering, by the terminal, the user-generated content in the played image frame according to the display location, including:
    所述终端按照所述展示位置,在播放的图像帧中渲染按照所述立体旋转参数旋转后的所述用户生成内容。And the terminal, according to the display position, rendering the user generated content rotated according to the stereo rotation parameter in a played image frame.
  7. 根据权利要求1所述的方法,其特征在于,所述终端按照所述展示位置,在播放的图像帧中渲染所述用户生成内容,包括:The method according to claim 1, wherein the terminal renders the user-generated content in a played image frame according to the display position, including:
    所述终端在播放的图像帧中追踪所述模板图像中的物体区域;The terminal tracks an object area in the template image in a played image frame;
    所述终端根据所述展示位置和追踪到的物体区域确定追踪渲染位置;及所述终端在播放的图像帧中按照所述追踪渲染位置渲染所述用户生成内容。The terminal determines a tracking rendering position according to the display position and the tracked object area; and the terminal renders the user generated content according to the tracking rendering position in the played image frame.
  8. 根据权利要求1所述的方法,其特征在于,所述终端获取与选取的图像帧匹配的模板图像所关联的用户生成内容,包括:The method according to claim 1, wherein the terminal acquires user-generated content associated with the template image that matches the selected image frame, including:
    所述终端获取与选取的图像帧匹配的模板图像所关联的多个内容创建者信息及相对应的用户生成内容;The terminal acquires a plurality of content creator information and corresponding user generated content associated with the template image that matches the selected image frame;
    所述终端按照所述展示位置,在播放的图像帧中渲染所述用户生成内容,包括:And the rendering, by the terminal, the user-generated content in the played image frame according to the display location, including:
    所述终端展示所述多个内容创建者信息;The terminal displays the plurality of content creator information;
    所述终端选中所述多个内容创建者信息中的一个;及The terminal selects one of the plurality of content creator information; and
    所述终端按照选中的一个内容创建者信息相对应的展示位置,在播放的图像帧中渲染相对应的用户生成内容。The terminal renders the corresponding user generated content in the played image frame according to the selected corresponding content of the content creator information.
  9. 根据权利要求1所述的方法,其特征在于,还包括:The method of claim 1 further comprising:
    当选取的图像帧中包括人脸图像时,所述终端获取识别所述图像帧中包括的人脸图像得到的人脸情感特征识别结果;When the selected image frame includes a face image, the terminal acquires a face emotion feature recognition result obtained by identifying a face image included in the image frame;
    所述终端根据所述人脸情感特征识别结果,查找相应的情感特征图像;The terminal searches for a corresponding emotional feature image according to the facial emotion feature recognition result;
    所述终端获取所述情感特征图像在当前播放的图像帧中的展示位置;及Obtaining, by the terminal, a display position of the emotion feature image in an currently played image frame; and
    所述终端按照所述展示位置,在当前播放的图像帧中渲染所述情感特征图像。The terminal renders the emotion feature image in the currently played image frame according to the display position.
  10. 根据权利要求9所述的方法,其特征在于,所述终端获取识别所述图像帧中包括的人脸图像得到的人脸情感特征识别结果,包括:The method according to claim 9, wherein the obtaining, by the terminal, a facial emotion feature recognition result obtained by identifying a face image included in the image frame comprises:
    所述终端调整所述图像帧的尺寸至预设尺寸;The terminal adjusts a size of the image frame to a preset size;
    所述终端将调整后的所述图像帧的方向旋转至符合情感特征识别条件的方向;Transmitting, by the terminal, the direction of the adjusted image frame to a direction that meets an emotional feature recognition condition;
    所述终端发送旋转后的所述图像帧至服务器;及Transmitting, by the terminal, the rotated image frame to a server; and
    所述终端接收所述服务器返回的针对发送的所述图像帧的人脸情感特征识别结果。The terminal receives a facial sentiment feature recognition result returned by the server for the transmitted image frame.
  11. 根据权利要求9所述的方法,其特征在于,还包括:The method of claim 9 further comprising:
    所述终端提取采集所述图像帧时录制的语音数据;及The terminal extracts voice data recorded when the image frame is collected; and
    所述终端获取识别所述语音数据得到的语音情感特征识别结果;The terminal acquires a speech emotion feature recognition result obtained by identifying the voice data;
    所述终端根据所述人脸情感特征识别结果,查找相应的情感特征图像, 包括:The terminal searches for a corresponding emotional feature image according to the facial emotion feature recognition result, including:
    所述终端根据所述人脸情感特征识别结果和所述语音情感特征识别结果,查找相应的情感特征图像。The terminal searches for a corresponding emotional feature image according to the facial emotion feature recognition result and the voice emotion feature recognition result.
  12. 根据权利要求11所述的方法,其特征在于,所述终端获取识别所述语音数据得到的语音情感特征识别结果,包括:The method according to claim 11, wherein the obtaining, by the terminal, the speech emotion feature recognition result obtained by identifying the voice data comprises:
    所述终端将提取的所述语音数据识别为文本;The terminal identifies the extracted voice data as text;
    所述终端查找所述文本所包括的情感特征关键字;及The terminal searches for an emotional feature keyword included in the text; and
    所述终端根据查找到的所述情感特征关键字,获取与所述语音数据相对应的语音情感特征识别结果。The terminal acquires a speech emotion feature recognition result corresponding to the voice data according to the found emotional feature keyword.
  13. 根据权利要求11所述的方法,其特征在于,所述终端根据所述人脸情感特征识别结果和所述语音情感特征识别结果,查找相应的情感特征图像,包括:The method according to claim 11, wherein the terminal searches for a corresponding emotional feature image according to the facial emotion feature recognition result and the voice emotion feature recognition result, including:
    当所述人脸情感特征识别结果与所述语音情感特征识别结果匹配时,所述终端按照所述人脸情感特征识别结果查找相应的情感特征图像。When the facial emotion feature recognition result matches the voice emotion feature recognition result, the terminal searches for the corresponding emotion feature image according to the facial emotion feature recognition result.
  14. 根据权利要求13所述的方法,其特征在于,所述终端按照所述人脸情感特征识别结果查找相应的情感特征图像,包括:The method according to claim 13, wherein the terminal searches for a corresponding emotional feature image according to the facial emotion feature recognition result, including:
    所述终端提取所述人脸情感特征识别结果包括的情感特征类型和识别结果置信度;The terminal extracts an emotional feature type and a recognition result confidence degree included in the facial emotion feature recognition result;
    所述终端查找与所述情感特征类型对应的情感特征图像集合;及The terminal searches for an emotional feature image set corresponding to the emotional feature type; and
    所述终端从所述情感特征图像集合中,挑选出与所述识别结果置信度相对应的情感特征图像。The terminal selects an emotion feature image corresponding to the confidence of the recognition result from the emotion feature image set.
  15. 根据权利要求11所述的方法,其特征在于,所述终端根据所述人脸情感特征识别结果和所述语音情感特征识别结果,查找相应的情感特征图像,包括:The method according to claim 11, wherein the terminal searches for a corresponding emotional feature image according to the facial emotion feature recognition result and the voice emotion feature recognition result, including:
    当所述人脸情感特征识别结果与所述语音情感特征识别结果不匹配时,所述终端按照所述语音情感特征识别结果查找相应的情感特征图像。When the facial emotion feature recognition result does not match the voice emotion feature recognition result, the terminal searches for the corresponding emotion feature image according to the voice emotion feature recognition result.
  16. 根据权利要求9所述的方法,其特征在于,所述终端获取所述情感 特征图像在当前播放的图像帧中的展示位置,包括:The method according to claim 9, wherein the terminal acquires a display position of the emotion feature image in the currently played image frame, including:
    所述终端确定所述人脸图像在当前播放的图像帧中的显示位置;Determining, by the terminal, a display position of the face image in an currently played image frame;
    所述终端查询所述情感特征图像与所述人脸图像的相对位置;及The terminal queries a relative position of the emotional feature image and the face image; and
    所述终端根据所述显示位置和所述相对位置,确定所述情感特征图像在当前播放的图像帧中的展示位置。And determining, by the terminal, a display position of the emotion feature image in the currently played image frame according to the display position and the relative position.
  17. 根据权利要求16所述的方法,其特征在于,还包括:The method of claim 16 further comprising:
    所述终端在播放的图像帧中,追踪人脸图像的运动轨迹;及The terminal tracks the motion track of the face image in the played image frame; and
    所述终端根据追踪的运动轨迹,将所述情感特征图像跟随播放的图像帧中的人脸图像移动。The terminal moves the emotion feature image following the face image in the played image frame according to the tracked motion trajectory.
  18. 一个或多个存储有计算机可读指令的非易失性存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:One or more non-volatile storage media storing computer readable instructions, when executed by one or more processors, cause one or more processors to perform the following steps:
    从现实世界采集图像帧;Acquire image frames from the real world;
    将采集的图像帧按照采集的时序逐帧播放;The captured image frames are played frame by frame according to the collected timing;
    从采集的图像帧中选取图像帧;Selecting an image frame from the acquired image frames;
    获取与选取的图像帧匹配的模板图像所关联的用户生成内容;Acquiring user generated content associated with the template image that matches the selected image frame;
    获取所述用户生成内容在所述匹配的模板图像中的展示位置;及Obtaining a display position of the user generated content in the matched template image; and
    按照所述展示位置,在播放的图像帧中渲染所述用户生成内容。The user generated content is rendered in the played image frame in accordance with the placement.
  19. 根据权利要求18所述的存储介质,其特征在于,所述计算机可读指令还使得一个或多个处理器执行以下步骤:The storage medium of claim 18, wherein the computer readable instructions further cause one or more processors to perform the following steps:
    判断选取的图像帧的特征是否符合预设的模板图像特征;Determining whether the feature of the selected image frame conforms to a preset template image feature;
    当选取的图像帧的特征符合所述模板图像特征时,执行所述获取与选取的图像帧匹配的模板图像所关联的用户生成内容的步骤;及And performing, when the feature of the selected image frame conforms to the template image feature, performing the step of acquiring user generated content associated with the template image that matches the selected image frame; and
    当选取的图像帧的特征不符合所述模板图像特征时,返回至所述从采集的图像帧中选取图像帧的步骤。When the feature of the selected image frame does not conform to the template image feature, returning to the step of selecting an image frame from the acquired image frame.
  20. 根据权利要求18所述的存储介质,其特征在于,所述获取与选取的图像帧匹配的模板图像所关联的用户生成内容,包括:The storage medium according to claim 18, wherein the acquiring the user-generated content associated with the template image that matches the selected image frame comprises:
    获取在创建所述用户生成内容时配置的立体旋转参数;Obtaining a stereo rotation parameter configured when the user generated content is created;
    所述按照所述展示位置,在播放的图像帧中渲染所述用户生成内容,包括:And rendering the user generated content in the played image frame according to the display position, including:
    按照所述展示位置,在播放的图像帧中渲染按照所述立体旋转参数旋转后的所述用户生成内容。According to the display position, the user-generated content rotated according to the stereo rotation parameter is rendered in a played image frame.
  21. 根据权利要求18所述的存储介质,其特征在于,所述按照所述展示位置,在播放的图像帧中渲染所述用户生成内容,包括:The storage medium according to claim 18, wherein the rendering the user-generated content in the played image frame according to the display location comprises:
    在播放的图像帧中追踪所述模板图像中的物体区域;Tracking an object area in the template image in a played image frame;
    根据所述展示位置和追踪到的物体区域确定追踪渲染位置;及Determining a tracking rendering position based on the display location and the tracked object area; and
    在播放的图像帧中按照所述追踪渲染位置渲染所述用户生成内容。The user generated content is rendered in the played image frame in accordance with the tracked rendering position.
  22. 根据权利要求18所述的存储介质,其特征在于,所述计算机可读指令还使得一个或多个处理器执行以下步骤:The storage medium of claim 18, wherein the computer readable instructions further cause one or more processors to perform the following steps:
    当选取的所述图像帧中包括人脸图像时,获取识别所述图像帧中包括的人脸图像得到的人脸情感特征识别结果;Obtaining a facial emotion feature recognition result obtained by recognizing a face image included in the image frame when the selected image frame includes a face image;
    根据所述人脸情感特征识别结果,查找相应的情感特征图像;Finding a corresponding emotional feature image according to the facial emotion feature recognition result;
    获取所述情感特征图像在当前播放的图像帧中的展示位置;及Obtaining a display position of the emotional feature image in a currently played image frame; and
    按照所述展示位置,在当前播放的图像帧中渲染所述情感特征图像。The emotional feature image is rendered in the currently played image frame in accordance with the placement.
  23. 根据权利要求22所述的存储介质,其特征在于,所述计算机可读指令还使得一个或多个处理器执行以下步骤:The storage medium of claim 22, wherein the computer readable instructions further cause one or more processors to perform the following steps:
    提取采集所述图像帧时录制的语音数据;及Extracting voice data recorded when the image frame is acquired; and
    获取识别所述语音数据得到的语音情感特征识别结果;Obtaining a speech emotion feature recognition result obtained by identifying the voice data;
    所述根据所述人脸情感特征识别结果,查找相应的情感特征图像,包括:And searching for the corresponding emotional feature image according to the facial emotion feature recognition result, including:
    根据所述人脸情感特征识别结果和所述语音情感特征识别结果,查找相应的情感特征图像。And searching for a corresponding emotional feature image according to the facial emotion feature recognition result and the voice emotion feature recognition result.
  24. 根据权利要求22所述的存储介质,其特征在于,所述获取所述情感特征图像在当前播放的图像帧中的展示位置,包括:The storage medium according to claim 22, wherein the obtaining the display position of the emotion feature image in the currently played image frame comprises:
    确定所述人脸图像在当前播放的图像帧中的显示位置;Determining a display position of the face image in a currently played image frame;
    查询所述情感特征图像与所述人脸图像的相对位置;及Querying a relative position of the emotional feature image and the face image; and
    根据所述显示位置和所述相对位置,确定所述情感特征图像在当前播放的图像帧中的展示位置。And determining, according to the display position and the relative position, a display position of the emotion feature image in a currently played image frame.
  25. 一种终端,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行以下步骤:A terminal comprising a memory and a processor, the memory storing computer readable instructions, the computer readable instructions being executed by the processor such that the processor performs the following steps:
    从现实世界采集图像帧;Acquire image frames from the real world;
    将采集的图像帧按照采集的时序逐帧播放;The captured image frames are played frame by frame according to the collected timing;
    从采集的图像帧中选取图像帧;Selecting an image frame from the acquired image frames;
    获取与选取的图像帧匹配的模板图像所关联的用户生成内容;Acquiring user generated content associated with the template image that matches the selected image frame;
    获取所述用户生成内容在所述匹配的模板图像中的展示位置;及Obtaining a display position of the user generated content in the matched template image; and
    按照所述展示位置,在播放的图像帧中渲染所述用户生成内容。The user generated content is rendered in the played image frame in accordance with the placement.
  26. 根据权利要求25所述的终端,其特征在于,所述计算机可读指令还使得处理器执行以下步骤:The terminal of claim 25, wherein the computer readable instructions further cause the processor to perform the following steps:
    判断选取的图像帧的特征是否符合预设的模板图像特征;Determining whether the feature of the selected image frame conforms to a preset template image feature;
    当选取的图像帧的特征符合所述模板图像特征时,执行所述获取与选取的图像帧匹配的模板图像所关联的用户生成内容的步骤;及And performing, when the feature of the selected image frame conforms to the template image feature, performing the step of acquiring user generated content associated with the template image that matches the selected image frame; and
    当选取的图像帧的特征不符合所述模板图像特征时,返回至所述从采集的图像帧中选取图像帧的步骤。When the feature of the selected image frame does not conform to the template image feature, returning to the step of selecting an image frame from the acquired image frame.
  27. 根据权利要求25所述的终端,其特征在于,所述获取与选取的图像帧匹配的模板图像所关联的用户生成内容,包括:The terminal according to claim 25, wherein the acquiring the user-generated content associated with the template image that matches the selected image frame comprises:
    获取在创建所述用户生成内容时配置的立体旋转参数;Obtaining a stereo rotation parameter configured when the user generated content is created;
    所述按照所述展示位置,在播放的图像帧中渲染所述用户生成内容,包括:And rendering the user generated content in the played image frame according to the display position, including:
    按照所述展示位置,在播放的图像帧中渲染按照所述立体旋转参数旋转后的所述用户生成内容。According to the display position, the user-generated content rotated according to the stereo rotation parameter is rendered in a played image frame.
  28. 根据权利要求25所述的终端,其特征在于,所述按照所述展示位置, 在播放的图像帧中渲染所述用户生成内容,包括:The terminal according to claim 25, wherein the rendering the user-generated content in the played image frame according to the display position comprises:
    在播放的图像帧中追踪所述模板图像中的物体区域;Tracking an object area in the template image in a played image frame;
    根据所述展示位置和追踪到的物体区域确定追踪渲染位置;及Determining a tracking rendering position based on the display location and the tracked object area; and
    在播放的图像帧中按照所述追踪渲染位置渲染所述用户生成内容。The user generated content is rendered in the played image frame in accordance with the tracked rendering position.
  29. 根据权利要求25所述的终端,其特征在于,所述计算机可读指令还使得处理器执行以下步骤:The terminal of claim 25, wherein the computer readable instructions further cause the processor to perform the following steps:
    当选取的所述图像帧中包括人脸图像时,获取识别所述图像帧中包括的人脸图像得到的人脸情感特征识别结果;Obtaining a facial emotion feature recognition result obtained by recognizing a face image included in the image frame when the selected image frame includes a face image;
    根据所述人脸情感特征识别结果,查找相应的情感特征图像;Finding a corresponding emotional feature image according to the facial emotion feature recognition result;
    获取所述情感特征图像在当前播放的图像帧中的展示位置;及Obtaining a display position of the emotional feature image in a currently played image frame; and
    按照所述展示位置,在当前播放的图像帧中渲染所述情感特征图像。The emotional feature image is rendered in the currently played image frame in accordance with the placement.
  30. 根据权利要求29所述的终端,其特征在于,所述计算机可读指令还使得处理器执行以下步骤:The terminal of claim 29, wherein the computer readable instructions further cause the processor to perform the following steps:
    提取采集所述图像帧时录制的语音数据;及Extracting voice data recorded when the image frame is acquired; and
    获取识别所述语音数据得到的语音情感特征识别结果;Obtaining a speech emotion feature recognition result obtained by identifying the voice data;
    所述根据所述人脸情感特征识别结果,查找相应的情感特征图像,包括:And searching for the corresponding emotional feature image according to the facial emotion feature recognition result, including:
    根据所述人脸情感特征识别结果和所述语音情感特征识别结果,查找相应的情感特征图像。And searching for a corresponding emotional feature image according to the facial emotion feature recognition result and the voice emotion feature recognition result.
  31. 根据权利要求29所述的终端,其特征在于,所述获取所述情感特征图像在当前播放的图像帧中的展示位置,包括:The terminal according to claim 29, wherein the obtaining the display position of the emotion feature image in the currently played image frame comprises:
    确定所述人脸图像在当前播放的图像帧中的显示位置;Determining a display position of the face image in a currently played image frame;
    查询所述情感特征图像与所述人脸图像的相对位置;及Querying a relative position of the emotional feature image and the face image; and
    根据所述显示位置和所述相对位置,确定所述情感特征图像在当前播放的图像帧中的展示位置。And determining, according to the display position and the relative position, a display position of the emotion feature image in a currently played image frame.
PCT/CN2018/079228 2017-03-29 2018-03-16 Method for processing user-generated content, storage medium and terminal WO2018177134A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201710199078.4A CN107168619B (en) 2017-03-29 2017-03-29 User generated content processing method and device
CN201710199078.4 2017-03-29
CN201710282661.1A CN108334806B (en) 2017-04-26 2017-04-26 Image processing method and device and electronic equipment
CN201710282661.1 2017-04-26

Publications (1)

Publication Number Publication Date
WO2018177134A1 true WO2018177134A1 (en) 2018-10-04

Family

ID=63674198

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/079228 WO2018177134A1 (en) 2017-03-29 2018-03-16 Method for processing user-generated content, storage medium and terminal

Country Status (1)

Country Link
WO (1) WO2018177134A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109522799A (en) * 2018-10-16 2019-03-26 深圳壹账通智能科技有限公司 Information cuing method, device, computer equipment and storage medium
CN109670285A (en) * 2018-11-13 2019-04-23 平安科技(深圳)有限公司 Face recognition login method, device, computer equipment and storage medium
CN109840491A (en) * 2019-01-25 2019-06-04 平安科技(深圳)有限公司 Video stream playing method, system, computer installation and readable storage medium storing program for executing
US11379683B2 (en) 2019-02-28 2022-07-05 Stats Llc System and method for generating trackable video frames from broadcast video

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110321082A1 (en) * 2010-06-29 2011-12-29 At&T Intellectual Property I, L.P. User-Defined Modification of Video Content
CN103426003A (en) * 2012-05-22 2013-12-04 腾讯科技(深圳)有限公司 Implementation method and system for enhancing real interaction
CN104219559A (en) * 2013-05-31 2014-12-17 奥多比公司 Placing unobtrusive overlays in video content
CN107168619A (en) * 2017-03-29 2017-09-15 腾讯科技(深圳)有限公司 User-generated content treating method and apparatus

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110321082A1 (en) * 2010-06-29 2011-12-29 At&T Intellectual Property I, L.P. User-Defined Modification of Video Content
CN103426003A (en) * 2012-05-22 2013-12-04 腾讯科技(深圳)有限公司 Implementation method and system for enhancing real interaction
CN104219559A (en) * 2013-05-31 2014-12-17 奥多比公司 Placing unobtrusive overlays in video content
CN107168619A (en) * 2017-03-29 2017-09-15 腾讯科技(深圳)有限公司 User-generated content treating method and apparatus

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109522799A (en) * 2018-10-16 2019-03-26 深圳壹账通智能科技有限公司 Information cuing method, device, computer equipment and storage medium
CN109670285A (en) * 2018-11-13 2019-04-23 平安科技(深圳)有限公司 Face recognition login method, device, computer equipment and storage medium
CN109840491A (en) * 2019-01-25 2019-06-04 平安科技(深圳)有限公司 Video stream playing method, system, computer installation and readable storage medium storing program for executing
US11379683B2 (en) 2019-02-28 2022-07-05 Stats Llc System and method for generating trackable video frames from broadcast video
US11586840B2 (en) 2019-02-28 2023-02-21 Stats Llc System and method for player reidentification in broadcast video
US11830202B2 (en) 2019-02-28 2023-11-28 Stats Llc System and method for generating player tracking data from broadcast video
US11861848B2 (en) 2019-02-28 2024-01-02 Stats Llc System and method for generating trackable video frames from broadcast video
US11861850B2 (en) 2019-02-28 2024-01-02 Stats Llc System and method for player reidentification in broadcast video
US11935247B2 (en) 2019-02-28 2024-03-19 Stats Llc System and method for calibrating moving cameras capturing broadcast video

Similar Documents

Publication Publication Date Title
EP4128672B1 (en) Combining first user interface content into second user interface
US20240361881A1 (en) Updating avatar clothing for a user of a messaging system
US11094131B2 (en) Augmented reality apparatus and method
WO2017157272A1 (en) Information processing method and terminal
WO2021109678A1 (en) Video generation method and apparatus, electronic device, and storage medium
CN107168619B (en) User generated content processing method and device
US12020383B2 (en) Facial synthesis in augmented reality content for third party applications
US11680814B2 (en) Augmented reality-based translations associated with travel
US11769500B2 (en) Augmented reality-based translation of speech in association with travel
WO2018177134A1 (en) Method for processing user-generated content, storage medium and terminal
US11816926B2 (en) Interactive augmented reality content including facial synthesis
US20210304451A1 (en) Speech-based selection of augmented reality content for detected objects
WO2021195404A1 (en) Speech-based selection of augmented reality content for detected objects
US20240267485A1 (en) Facial synthesis in overlaid augmented reality content
CN113709545A (en) Video processing method and device, computer equipment and storage medium
CN115579023A (en) Video processing method, video processing device and electronic equipment
CN108334806B (en) Image processing method and device and electronic equipment
US12148064B2 (en) Facial synthesis in augmented reality content for advertisements
US12148244B2 (en) Interactive augmented reality content including facial synthesis
US20220319060A1 (en) Facial synthesis in augmented reality content for advertisements
US20230326094A1 (en) Integrating overlaid content into displayed data via graphics processing circuitry and processing circuitry using a computing memory and an operating system memory

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18775573

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18775573

Country of ref document: EP

Kind code of ref document: A1