WO2018177134A1

WO2018177134A1 - Method for processing user-generated content, storage medium and terminal

Info

Publication number: WO2018177134A1
Application number: PCT/CN2018/079228
Authority: WO
Inventors: 杨田从雨; 陈宇; 张�浩; 华有为; 薛丰; 肖鸿志; 冯绪; 吴昊; 张振伟; 欧义挺; 董晓龙; 戚广全; 谢俊驰; 谢斯豪; 梁雪; 段韧; 张新磊
Original assignee: 腾讯科技（深圳）有限公司
Priority date: 2017-03-29
Filing date: 2018-03-16
Publication date: 2018-10-04

Abstract

A method for processing user-generated content, comprising: collecting image frames from the real world; playing the collected image frames frame by frame according to the time sequence of collection; selecting an image frame from the collected image frames; acquiring user-generated content associated with a template image matching the selected image frame; acquiring a presentation position of the user-generated content in the matched template image; and according to the presentation position, rendering the user-generated content in the played image frame.

Description

User generated content processing method, storage medium, and terminal

This application is required to be submitted to the China Patent Office on March 29, 2017, the application number is 201710199078.4, and the priority of the Chinese patent application entitled "User-generated content processing method and device" is submitted to China on April 26, 2017. Patent Application No. 201710282661.1, the entire disclosure of which is incorporated herein by reference in its entirety in its entirety in its entirety in the the the the the the the the the the

Technical field

The present application relates to the field of computer technologies, and in particular, to a user generated content processing method, a storage medium, and a terminal.

Background technique

Social applications are an application that is widely used today. Based on social applications, social network-based social relationships can be established between users, thereby interacting based on social relationships, such as sending instant messages, voice calls, video calls, and online meetings, which greatly facilitates people's lives and work. Currently, social applications are able to display User Generated Content (UGC).

At present, after establishing a social relationship between users, the user can find the personal homepage of the other party, or appear on the other party's friend sharing page, and the user generated content of each other will be displayed on the personal homepage or the friend sharing page, so the user currently generates the content. Display must rely on social relationships and limit the spread of user-generated content.

Summary of the invention

According to various embodiments provided herein, a user generated content processing method, a storage medium, and a terminal are provided.

A user generated content processing method, including:

The terminal collects image frames from the real world;

The terminal plays the captured image frames frame by frame according to the collected timing;

The terminal selects an image frame from the collected image frames;

The terminal acquires user generated content associated with a template image that matches the selected image frame;

Obtaining, by the terminal, a display position of the user generated content in the matched template image; and

The terminal renders the user generated content in a played image frame according to the display position.

One or more non-transitory computer readable storage media storing computer executable instructions, when executed by one or more processors, cause the one or more processors to perform the following steps:

Acquire image frames from the real world;

The captured image frames are played frame by frame according to the collected timing;

Selecting an image frame from the acquired image frames;

Acquiring user generated content associated with the template image that matches the selected image frame;

Obtaining a display position of the user generated content in the matched template image; and

The user generated content is rendered in the played image frame in accordance with the placement.

A terminal comprising a memory and a processor, the memory storing computer readable instructions, the computer readable instructions being executed by the processor such that the processor performs the following steps:

Acquire image frames from the real world;

Selecting an image frame from the acquired image frames;

Details of one or more embodiments of the present application are set forth in the accompanying drawings and description below. Other features, objects, and advantages of the invention will be apparent from the description and appended claims.

DRAWINGS

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the present application. Other drawings may also be obtained from those of ordinary skill in the art in light of the inventive work.

1 is an application environment diagram of a user generated content processing method in an embodiment;

2 is a schematic diagram showing the internal structure of a terminal in an embodiment;

3 is a schematic flowchart of a user generated content processing method in an embodiment;

4 is a schematic flowchart of a user-generated content processing method in a specific application scenario;

5 is a schematic diagram of a main page of a social application in one embodiment;

6 is a schematic diagram showing a tool menu in a main page in one embodiment;

Figure 7 is a comparison diagram of a virtual world page and a real world object entered through a function entry in one embodiment;

8 is a comparison diagram of a virtual world page and a real world object showing a list of content creator avatars in one embodiment;

9 is a comparison diagram of a virtual world page having a comment page and a real world object in one embodiment;

10 is a comparison diagram of a virtual world page having a content creation portal and a real world object in one embodiment;

11 is a comparison diagram of a virtual world page with a photo editing page and a real world object in one embodiment;

12 is a schematic flow chart of a user generated content processing method in another embodiment;

13 is a schematic flow chart of a user generated content processing method in another embodiment;

14 is a schematic diagram of comparison of front and rear interfaces of an emotional feature image in one embodiment;

FIG. 15 is a schematic diagram showing a comparison of front and rear interfaces of texts recognized according to voice data in an embodiment; FIG.

Figure 16 is a block diagram showing the structure of a terminal in an embodiment;

17 is a structural block diagram of a terminal in another embodiment; and

Figure 18 is a block diagram showing the structure of a terminal in another embodiment.

detailed description

In order to make the objects, technical solutions, and advantages of the present application more comprehensible, the present application will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the application and are not intended to be limiting.

FIG. 1 is an application environment diagram of a user generated content processing method in an embodiment. Referring to FIG. 1, the application environment includes a terminal 110 and a server 120, and the terminal 110 can communicate with the server 120 via a network. The terminal 110 can be used to collect image frames from the real world; the captured image frames are played frame by frame according to the collected timing; the image frames are selected from the collected image frames; and the template image matched with the selected image frames is pulled from the server 120. The associated user generated content, as well as the placement of the user generated content in the matched template image; the user generated content is rendered in the played image frame in accordance with the placement. The server 120 can be configured to store a correspondence between template images, user generated content, and placements of user generated content in matching template images.

FIG. 2 is a schematic diagram showing the internal structure of a terminal in an embodiment. The terminal may specifically be the terminal 110 as shown in FIG. 1. Referring to FIG. 2, the terminal includes a processor, a memory, a network interface, a display screen, a camera, and an input device connected through a system bus. Wherein, the memory comprises a non-volatile storage medium and an internal memory. The non-volatile storage medium of the terminal stores an operating system and computer readable instructions that, when executed, cause the processor to perform a user generated content processing method. The processor of the terminal is used to provide computing and control capabilities to support the operation of the entire terminal. Computer readable instructions may be stored in the internal memory in the terminal, the computer readable instructions being executable by the processor to cause the processor to perform a user generated content processing method. The network interface of the terminal is used for network communication with the server 120, such as uploading an image frame, uploading the created user generated content, or pulling the user generated content. The camera of the terminal is used to acquire image frames. The display screen of the terminal may be a liquid crystal display or an electronic ink display screen, and the input device of the terminal may be a touch layer covered on the display screen, or a button, a trackball or a touchpad provided on the terminal housing, or may be an external connection. Keyboard, trackpad or mouse. The terminal includes a fixed terminal and a mobile terminal, and the mobile terminal includes one or a combination of a mobile phone, a tablet computer, a personal digital assistant, and a wearable device. It will be understood by those skilled in the art that the structure shown in FIG. 2 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the terminal to which the solution of the present application is applied. The specific terminal may include a ratio. More or fewer components are shown in the figures, or some components are combined, or have different component arrangements.

As shown in FIG. 3, in one embodiment, a user generated content processing method is provided. This embodiment is mainly illustrated by the method being applied to the terminal 110 in FIG. 1 described above. Referring to FIG. 3, the user generated content processing method specifically includes the following steps:

S302. Acquire an image frame from the real world.

Among them, the real world is a world in which nature exists and a world in which human beings live. An image frame is a unit in a sequence of image frames capable of forming a dynamic picture for recording objects in the real world at a certain moment.

In one embodiment, the terminal may acquire image frames from the real world, in particular at a fixed or dynamic frame rate. Among them, the fixed or dynamic frame rate enables a continuous dynamic picture to be formed when the image frame is played at the fixed or dynamic frame rate.

In one embodiment, the terminal can capture real-world image frames in the current field of view of the camera through the camera. Among them, the field of view of the camera may vary due to changes in the posture and position of the terminal.

In one embodiment, the terminal may provide an AR (Augmented Reality) shooting mode through a social application, and acquire an image frame from the real world after the AR shooting mode is selected. Among them, the social application is an application capable of social interaction of the network based on the social network. Social applications include instant messaging applications, SNS (Social Network Service) applications, live broadcast applications, or photo applications.

S304. The captured image frames are played frame by frame according to the collected timing.

The timing of the acquisition refers to the time sequence when the image frames are collected, and can be represented by the size relationship of the time stamps recorded by the image frames during the acquisition. Frame-by-frame playback refers to playback from image to frame.

The terminal may specifically play the collected image frames one by one according to the frame rate of the collected image frames in ascending order of the time stamp. The terminal can directly play the captured image frame, or store the captured image frame into the buffer area according to the collected timing, and take out the image frame playback from the buffer area according to the collected timing.

S306. Select an image frame from the collected image frames.

The selected image frame may be a key frame in the captured image frame.

In one embodiment, the terminal can receive a user selection instruction to select an image frame from the acquired image frames in accordance with the user selection instruction.

In one embodiment, the terminal may select an image frame currently being captured or currently being played from the captured image frame when the played image frame conforms to the picture stabilization condition. The picture stabilization condition may be that the difference of the played image frame within the preset time length is within the set range.

S308. Acquire user-generated content associated with the template image that matches the selected image frame.

Among them, user generated content refers to content generated by the user. The user generated content may include at least one of text, picture, audio, or video. The user-generated content may be the content published by the user, or may be the user's comment content on the published content, or may be the user's reply content to the comment content.

Template images are associated with user-generated content and are used to tag user-generated content. The associated user generated content can be located through the template image. A template image can be associated with one or more user generated content. A template image can be associated with user-generated content published by one or more users. A user who posts user generated content may be referred to as a content creator.

In an embodiment, determining whether the selected image frame and the template image match, specifically calculating a similarity between the selected image frame and the template image, and determining whether the similarity is greater than or equal to a preset similarity; if yes, Then match; if not, it does not match.

When calculating the similarity between the selected image frame and the template image, the respective features of the selected image frame and the template image may be extracted first, thereby calculating the difference between the two features. The greater the difference between the features, the lower the similarity is. The smaller the difference between the features, the higher the similarity. Specifically, the feature may be extracted through a trained neural network model, and specifically one or a combination of a color feature, a texture feature, and a shape feature may be extracted. The similarity may be a cosine similarity or a Hamming distance in which the respective perceived hash values are between the images.

In an embodiment, the terminal may first query a template image that matches the selected image frame from the local buffer area, and when the matching template image is queried, pull the user generated by the template image from the local buffer area or the server. content. When the matching template image is not found in the local buffer area, the terminal may further query the template image that matches the selected image frame from the server. When the matching template image is queried in the local buffer area, the terminal may pull the template image from the server. User-generated content associated with the template image. After the terminal queries the matching template image from the server, the matching template image can be stored in the local buffer area.

In an embodiment, the terminal may acquire user generated content, and the template image associated with the user generated content matches the selected image frame, and the geographic location corresponding to the template image satisfies the similar condition with the current geographic location. The similar condition is a quantitative condition indicating that two geographical locations are close, and similar conditions such as a distance between geographical locations are less than or equal to a preset value. In this embodiment, a more accurate matching can be performed in combination with the geographical location.

S310. Acquire a display position of the user-generated content in the matched template image.

Wherein, the user generates a display position of the content in the matched template image, and the display position represents an area occupied by the user-generated content in the template image. The placement location may be represented by a coordinate representation of the area occupied by the user in the template image in the coordinate system of the template image.

In an embodiment, the terminal may obtain the display position of the user-generated content together when acquiring the user-generated content. The terminal can obtain the placement specifically from a local cache or a server.

S312. Render user-generated content in the played image frame according to the display position.

Specifically, the terminal may render the user generated content at the obtained display position in the currently played image frame. The terminal may acquire the style data corresponding to the user-generated content, thereby rendering the user-generated content in the played image frame according to the style data and the obtained display position.

In one embodiment, the presentation location may be a location of the user generated content relative to the object area in the template image; the terminal may track the object area in the panel image in the played image frame, thereby following the display location and the tracked object area And determining a position of the user-generated content in the currently played image frame relative to the tracked object region, thereby rendering the user-generated content according to the determined position.

The object area is an area in the image that can represent a real-world object. The object can be a living or a non-living creature, such as a human body, an animal body, or a plant body, such as a building, an industrial product, or a natural landscape.

The user-generated content processing method collects image frames from the real world and plays them according to the collected timing. By selecting the image frames from the captured image frames, the user-generated content associated with the template image matched by the image frames can be determined. And show it. The ability to locate and display user-generated content through image frames captured in the real world can extend the way user-generated content is propagated without having to rely on social relationships. Moreover, according to the display position of the user-generated content in the matched template image, the user-generated content is tracked and rendered in the played image frame, and the user-generated content in the virtual world is merged with the real world reflected by the played video frame, and A new way of interacting with user-generated content.

In one embodiment, the user generated content processing method further comprises: determining whether the feature of the selected image frame conforms to a preset template image feature. This step can be specifically performed after S306. When the feature of the selected image frame conforms to the template image feature, step S308 is performed; when the feature of the selected image frame does not conform to the template image feature, the process returns to step S306.

The preset template image feature is a feature that the image that is preset as the template image should have. Template images can be well differentiated to avoid confusion with user-generated content associated with different template images.

In an embodiment, determining whether the feature of the selected image frame conforms to the preset template image feature comprises: extracting a feature point of the selected image frame, and determining whether the number of the extracted feature points reaches a preset threshold value of the template image feature point. . In this embodiment, the preset template image feature is that the number of feature points reaches a preset template image feature point threshold.

The feature points are points in the selected image frame that have distinctive characteristics and can effectively reflect the essential features of the image, and the feature points have the ability to identify objects in the image frame. The threshold value of the template image feature points can be set as needed. The higher the threshold value of the template image feature points, the better the distinguishability of the image frames that can be used as the template image.

In an embodiment, determining whether the feature of the selected image frame conforms to the preset template image feature comprises: obtaining a resolution of the selected image frame, and determining whether the resolution reaches a preset template image resolution threshold. In this embodiment, the preset template image feature is that the resolution reaches a preset template image resolution threshold.

The resolution of the selected image frame represents the width and height of the image frame, and the preset template image resolution threshold includes a preset template image width and a template image height. The terminal specifically obtains the width and height of the selected image frame, and determines whether the acquired width and height respectively reach the preset template image width and the template image height.

In an embodiment, determining whether the feature of the selected image frame conforms to the preset template image feature comprises: obtaining a sharpness of the selected image frame, and determining whether the sharpness reaches a preset template image sharpness threshold. In this embodiment, the preset template image feature is that the resolution reaches a preset template image sharpness threshold.

Among them, the definition and resolution are different, which refers to the degree of clarity of each detail and its boundary on the corresponding image frame. The terminal can convert the selected image frame into a grayscale image, detect the edge in the grayscale image, thereby determining the grayscale change rate at the edge, and determining the sharpness according to the grayscale change rate. The faster the gray level changes at the edge, the higher the definition; the slower the gray level change at the edge, the lower the resolution.

In an embodiment, determining whether the feature of the selected image frame conforms to the preset template image feature comprises: obtaining an occupied area of the selected image frame in the selected image frame, and determining whether the proportion reaches a preset template. The proportion of image objects. In this embodiment, the preset template image feature is that the proportion of the object area occupied by the selected image frame reaches a preset template image object ratio.

Specifically, the terminal can detect the edge of the selected image frame, and use the closed area of the detected area to reach the preset area as the object area, and determine whether the area of the object area occupies the proportion of the total area of the selected image frame. The preset template image object ratio. The area of an image or area can be represented by the number of pixels contained in the image or area.

In the above embodiments, the conditions for determining whether the feature of the selected image frame conforms to the preset template image feature can be freely combined; when the combined conditions are satisfied, the determination conforms to the preset template image feature; when the combined condition is at least one When satisfied, it is determined that the preset template image feature is not met.

In the above embodiment, when the feature of the selected image frame conforms to the template image feature, the user-generated content associated with the template image that matches the selected image frame is obtained, and the image frame that is difficult to match the template image can be directly filtered out, and the processing is improved. effectiveness.

In an embodiment, step S308 includes: uploading the selected image frame to the server; receiving, by the server, a first notification indicating that the template image is matched to the uploaded image frame; and acquiring, according to the first notification, acquiring the template image User generated content.

The first notification and the second notification described below are both notifications, and the first and second descriptions are used to distinguish different notifications. The notification can be an independent message or a message that mixes multiple types of information.

Specifically, the terminal uploads the selected image frame to the server, and the server queries the template image that matches the uploaded image frame. When the template image is queried, the server returns a first notification to the terminal, the first notification indicating that the template image matching the uploaded image frame is queried.

In an embodiment, the terminal may upload the user account and the selected image frame used for the local login to the server, and receive a first notification fed back by the server, where the first notification indicates that the template image matching the uploaded image frame is queried, and The user-generated content associated with the template image has open access to the uploaded user account. The terminal may further acquire, according to the first notification, user generated content associated with the template image and having access rights to the uploaded user account.

Among them, access rights can be set when user-generated content is created. For example, if the content creator sets the access right that is only visible to the friend when creating the user-generated content, when the uploaded user account has a friend relationship with the creator's user account, the uploaded user account has the user-generated content. access permission. If the content creator sets access rights visible to everyone when creating user-generated content, any legitimate user account has access to the user-generated content.

The terminal may obtain a matching template image according to the first notification, and cache the template image into a local buffer area. The terminal may also obtain user information related to user generated content, such as a user account, a user avatar, or a user nickname.

In an embodiment, the terminal may directly acquire the user-generated content associated with the template image from the first notification, and may also obtain the template image and/or the user information related to the user-generated content from the first notification.

In an embodiment, the terminal may obtain an image number of the matched template image from the first notification, thereby sending a query request carrying the image number to the server, thereby receiving the user generated by the server and fed back in association with the image number. content. The terminal may also query the template image and/or user information corresponding to the image number from the server.

In the above embodiment, the server implements matching of the uploaded image frame and the template image, so that each user can perform user-generated content-based interaction for the same or similar scenes in the real world based on the server, realizing real world-based, virtual Social interaction between the world and social networks.

In an embodiment, step S308 includes: uploading the selected image frame to the server; receiving, by the server, a second notification indicating that the template image that matches the uploaded image frame is not queried; and displaying the content creation entry according to the second notification; The user generated content is created according to the operation of creating an entry for the content; the created user generated content is uploaded to the server, so that the server stores the uploaded user generated content in association with the template image registered by the uploaded image frame.

The template image that matches the uploaded image frame is not queried, and the template image that matches the uploaded image frame does not exist on the server; or the template image that matches the uploaded image frame exists on the server, but the template image None of the user-generated content corresponding to the template image has open access to the user account that triggered the image frame upload.

The content creation portal is used to trigger the creation of user-generated content. The content creation portal can be a visual control that can trigger an event, such as an icon or button. The content creation portal may specifically be an entry that triggers the creation of brand new user generated content. The new user-generated content refers to user-generated content that is independent of the existing user-generated content. The content creation portal can also be an entry that triggers the creation of user-generated content associated with existing user-generated content. Among them, user generated content associated with existing user generated content such as comment content or reply content to the comment content.

Specifically, the terminal uploads the selected image frame to the server, and the server queries the template image that matches the selected image frame. When the server queries the matching template image, the server feeds back the first notification to the terminal; when the server does not query the matching template image, the server registers the uploaded image frame as a template image, and feeds back the second notification to the terminal. After receiving the second notification, the terminal displays the content creation entry.

Further, the terminal detects an operation of creating an entry for the content, acquires content input by the user according to the detected operation, thereby creating user generated content, uploading the user generated content to the server, and uploading the generated user generated content by the server The template image registered with the image frame is associated and stored. If the server does not receive the created user-generated content within the preset duration from the registration of the uploaded image frame as the template image, or receives the cancellation registration request fed back by the terminal, the registration of the uploaded image frame is canceled.

In the foregoing embodiment, for a certain scene in the real world, when the associated user generated content does not exist, the user generated content associated with the scene may be created, and the image frame uploaded this time can be used as the template image. Matching, continually enriching user-generated content, and providing users with a more convenient way to interact based on the real world and the virtual world.

In one embodiment, the user generated content processing method further includes: acquiring a stereo rotation parameter configured when the user generated content is created. This step can be performed specifically before S312. Step S312 includes: rendering the user-generated content rotated according to the stereo rotation parameter in the played image frame according to the display position.

The stereo rotation parameter refers to a parameter that rotates user generated content in a virtual coordinate system of the virtual world. Stereo rotation parameters such as a horizontal rotation angle and/or a vertical rotation angle. The horizontal rotation angle refers to an angle rotated when the user generated content is rotated along a horizontal plane in a virtual coordinate system of the virtual world. The vertical rotation angle is an angle rotated when the user-generated content is rotated along a vertical plane in the virtual world's stereo coordinate system. The stereo rotation parameter can be configured together when the user generated content is created, and corresponds to the user generated content storage.

In the above embodiment, the user may configure a stereo rotation parameter of the user-generated content when creating the user-generated content, so that when the image frame reflecting the real world is played, the user-generated content rotated according to the stereo rotation parameter may be displayed, and New ways of interacting.

In an embodiment, step S312 includes: tracking an object region in the template image in the played image frame; determining a tracking rendering position according to the display position and the tracked object region; rendering the user according to the tracking rendering position in the played image frame Generate content.

Among them, tracking refers to the change of the location of the object in the continuously played image frame. Changes in the area of the object such as changes in position and/or changes in morphology. Tracking the rendering position refers to the real-time rendering position of the user-generated content in the played image frame. The selected image frame is matched with the template image, and the terminal can use the image region of the template image as the object region in the selected image frame, and then track the object region in the played image frame.

The placement position may indicate the position of the user-generated content to be displayed relative to the object area in the template image at the time of presentation, and the tracking rendering position of the user-generated content may be determined according to the display position and the positional change of the tracked object area.

Further, according to the display position and the morphological change of the tracked object area, the tracking rendering form of the user-generated content can be determined, so that the user-generated content can be rendered in the played image frame according to the tracking rendering position and the tracking rendering mode. The tracking rendering pattern can be represented by real-time stereo rotation parameters.

In the above embodiment, the object region in the template image is tracked in the played image frame, so that the user generated content is tracked and rendered according to the tracked object region in the played image frame, thereby realizing user generated content in the virtual world and The strong association of objects in the real world enables a new way of interacting between user-generated content between the virtual world and the real world.

In an embodiment, the terminal may track an object region in the template image in the played image frame; detect a morphological change of the tracked object region relative to the object region in the template image; and determine a parameter indicating the viewing direction according to the morphological change; According to the display position, the user-generated content deformed according to the parameter indicating the viewing direction is rendered in the played image frame.

In the present embodiment, when the observation direction of the object in the real world changes, the parameter indicating the observation direction can be determined by detecting the morphological change of the tracked object region with respect to the object region in the template image. The user-generated content is deformed according to the parameter, so that the transformed user-generated content can represent the change of the observation direction, realizing the strong association between the user-generated content in the virtual world and the object in the real world, realizing the virtual world and the real world. A new way to interact based on user-generated content.

In one embodiment, step S308 includes: acquiring a plurality of content creator information associated with the template image that matches the selected image frame and corresponding user generated content. Step S312 includes: displaying a plurality of content creator information; selecting one of the plurality of content creator information; and rendering the corresponding user-generated content in the played image frame according to the selected placement position corresponding to the content creator information .

The content creator information refers to the identity information of the content creator of the user-generated content, and may be a user avatar, a user nickname, or a user account of the corresponding content creator. The same template image may be associated with more than one user generated content, each user generated content corresponding to one content creator information, such that one user generated content may be associated with multiple content creator information.

The number of multiple content creator information depends on the number of content creators of user generated content associated with the same template image. Each content creator information corresponds to one user-generated content, and each user-generated content corresponds to one display location, and the terminal may render corresponding user-generated content in the played image frame according to the selected display position corresponding to the user-generated content.

In the above embodiment, one template image may be associated with user generated content created by a plurality of content creators, and the number of user generated content that can be associated with objects in the real world is expanded; the user may be created by multiple content creators. Switching between user-generated content extends the dimensions used to interact based on the virtual world and the real world.

Referring to FIG. 4, the principle of the above-described user generated content processing method will be described below with a specific application scenario. The user can enter the social application, and the social application displays the main page as shown in FIG. The user can click on the tool menu trigger button 502 in the main page to cause the social application to display the tool menu 601 in the main page as shown in FIG. 6, which includes the function entry 602. The user clicks on the function entry 602, so that the social application starts to collect image frames from the real world, and the captured image frames are played frame by frame according to the collected timing. Referring to FIG. 7 left, the terminal forms a real-time dynamic picture reflecting the real world.

When the terminal plays an image frame, if the image frame does not change substantially within the preset duration, the currently played image frame is selected to determine whether the selected image frame conforms to the preset template image feature. When the selected image frame does not conform to the template image feature, the user is prompted to not recognize the object, and the image frame is continuously collected and played. When the selected image frame conforms to the template image feature, the terminal further determines whether the template image matching the selected image frame is cached locally.

When the template image matching the selected image frame is locally cached, the terminal pulls the user generated content, the corresponding content creator avatar, and the corresponding display position respectively created by the plurality of content creators associated with the template image, thereby As shown on the left of FIG. 8, on the currently played video frame, a content creator avatar list 801 is displayed. The user selects a content creator avatar 801a in the content creator avatar list such that the social application displays the corresponding user generated

content

802 and 803 according to the placement corresponding to the selected content creator avatar.

If the template image is configured with a stereo rotation angle, the terminal will display the user generated

content

802 and 803 according to the stereo rotation angle according to the stereo rotation angle. When object areas (such as wine glasses and cups) change in the image frame being played, the user generated

content

802 and 803 will change. When the viewing angle of the object area changes, the user generated

content

802 and 803 also rotate accordingly.

The user can perform an upward sliding operation in the page shown on the left of FIG. 8 to enter a comment page for the currently displayed user generated content. As shown in FIG. 9, the user can add a comment content or a comment reply in the comment page. content.

When the template image matching the selected image frame is not cached locally, the terminal uploads the selected image frame to the server, and the server matches the template image for the uploaded image frame. If the server queries the matching template image, the terminal may pull the user-generated content, the corresponding content creator avatar, and the corresponding display position respectively created by the multiple content creators associated with the template image, so as shown in FIG. 8 The content creator avatar list 801 is displayed on the currently played video frame. The user selects a content creator avatar 801a in the content creator avatar list such that the social application displays the corresponding user generated

content

If the server does not query the matching template image, the terminal may display the content creation portal 1001 as shown in FIG. 10. After the user clicks the content creation portal 1001, the user may select a picture and/or input text, and may also be as shown in FIG. In the photo editing page, the image is edited. For example, if the stereo rotation is performed, whether the access right visible only to the friend can be set, the user generated content is created after confirmation, and the user generated content is uploaded to the server, and the user uploaded by the server is uploaded. The generated content is stored in association with the template image registered by the uploaded image frame. If uploading user-generated content fails, the social app will prompt an error and enter the outbox for re-uploading user-generated content.

On the other hand, with the development of computer technology, image processing technology has also been continuously improved. Users can process images with professional image processing software to make processed images perform better. The user can also attach the material provided by the image processing software to the image through the image processing software, so that the processed image can transmit more information. However, the current image processing method requires the user to expand the material library of the image processing software, browse the material library, select the appropriate material from the material library, adjust the position of the material in the image, thereby confirming the modification and completing the image processing. Therefore, the current image processing method requires a large number of manual operations, which takes a long time, resulting in low efficiency of the image processing process.

Based on this, the user-generated content processing method in the foregoing embodiment may further include a step of face image processing to improve the efficiency of image processing by performing the step of the face image processing.

In an embodiment, after selecting an image frame from the collected image frames, the terminal may detect whether a face image is included in the selected image frame, and may continue to execute in the foregoing embodiment when the selected image frame includes the face image region. Steps subsequent to S306, the step of face image processing can also be performed.

As shown in FIG. 12, in one embodiment, when the face image area is included in the selected image frame, the step of the face image processing included in the user-generated content processing method may specifically include the following steps. These steps can be specifically performed after S306.

S1202: Acquire a face emotion feature recognition result obtained by identifying a face image included in the image frame.

Among them, the emotional characteristics are characteristics that reflect the emotions of humans or animals. Emotional features are features that are identifiable and processed by the computer. Emotional characteristics such as happiness, depression or anger. The facial emotion feature refers to the emotional feature reflected by the facial expression.

In an embodiment, the terminal may detect whether a face image is included in the collected image frame when the image frame is acquired from the real scene. If the terminal includes the face image in the image frame that is determined to be collected, the face image included in the image frame is subjected to expression recognition, and the recognized face emotion feature recognition result is obtained.

In one embodiment, the terminal may extract image data included in the captured image frame after capturing an image frame of the real scene through the camera in the current field of view of the camera, and detect whether the image data includes facial feature data. If the terminal detects that the face data is included in the image data, it is determined that the image frame includes a face image. The terminal may further extract expression feature data from the facial feature data, and perform facial expression recognition on the facial image included in the collected image frame according to the extracted facial expression feature data to obtain a facial emotion feature recognition result. The expression feature data may be one or more pieces of feature information for reflecting the contour of the face, the glasses, the nose, the mouth, and the distance between the respective facial organs.

For example, when people feel happy, the corners of the mouth will rise. If the expression data extracted by the terminal including the facial feature data in the image frame is raised in the corner of the mouth, it can indicate that the emotional feature reflected by the face in the image frame is Happy. When people feel surprised, the mouth opens a large amount. If the terminal extracts the feature data extracted from the face feature data in the image frame, the mouth opening amplitude is large, and the face in the image frame can be represented. The emotional characteristics reflected are astonished.

In an embodiment, the terminal may also send the detected image frame including the face image to the server, and after receiving the image frame sent by the terminal, the server performs expression recognition on the face image included in the image frame. The face emotion feature recognition result is fed back to the terminal, and the terminal obtains the face emotion feature recognition result returned by the server.

In an embodiment, the terminal may also detect whether the face image is included in the received image frame after receiving the image frame acquired from the real scene sent by the other terminal. If the terminal includes the face image in the received image frame, the face image included in the image frame may be locally recognized by the face to obtain a corresponding face emotion feature recognition result; the image frame may also be sent. To the server, the server returns the face emotion feature recognition result after identifying the face image included in the image frame.

S1204: Search for a corresponding emotional feature image according to the facial emotion feature recognition result.

Among them, the emotional feature image refers to an image that reflects the emotional feature. Images that reflect sad emotional features such as images including tears or images including rainy scenes. An image of an emotional feature that reflects anger, such as an image including a flame. The emotion feature image may be an image that the terminal climbs from the Internet, or may be an image that the terminal captures according to an imaging device included through the terminal. The emotional feature image can be a dynamic image or a static image.

In one embodiment, the terminal may select an emotional feature that can be image processed in advance, and configure a corresponding emotional feature image corresponding to the selected emotional feature. After obtaining the facial emotion feature recognition result, the terminal acquires the emotional feature image corresponding to the emotional feature represented by the facial emotion feature recognition result.

In one embodiment, the terminal may previously create an emotional feature image library and map the emotional feature images in the emotional feature image library that reflect the same emotional features to the same emotional features. After obtaining the facial emotion feature recognition result, the terminal may search for the emotional feature image that matches the reflected emotional feature and the facial emotional feature recognition result in the emotional feature image database.

In one embodiment, the library of emotional feature images previously established by the terminal may include a plurality of sets of emotional feature images, each set of emotional features reflecting an emotional feature. After obtaining the facial emotion feature recognition result, the terminal searches for the emotional feature image set that is consistent with the facial feature recognition result reflected in the emotional feature image database, and selects the emotional feature image from the found emotional feature image set.

S1206: Acquire a display position of the emotional feature image in the currently played image frame.

Wherein the display position of the emotional feature image in the currently played image frame, the display position represents an area occupied by the emotional feature image in the currently played image frame. The placement location may be represented by the coordinates of the region of the emotional feature image that is occupied in the currently played image frame in the coordinate system in the currently played image frame.

In an embodiment, the terminal may acquire the display position of the emotional feature image together when searching for the emotional feature image. The terminal specifically obtains the corresponding drawing manner of the found emotional feature image from the local, and determines the display position of the emotional feature image according to the obtained drawing manner.

Further, the manner in which the emotional feature image is drawn may be a dynamic follow reference. Specifically, the terminal may determine a display position of the reference object that the searched emotional feature image needs to dynamically follow in the currently played image frame, and then determine a display position of the emotional feature image in the currently played image frame according to the display position of the reference object. .

The way in which the emotional feature image is drawn can also be a static display. Specifically, for the static feature image displayed by the terminal, the terminal may directly set the display area of the emotion feature image in the currently played image frame, and the terminal may directly acquire the emotion feature image.

S1208: Render an emotional feature image in the currently played image frame according to the display position.

Specifically, the terminal may render the emotional feature image at the acquired display position in the currently played image frame. The terminal may acquire the style data corresponding to the emotion feature image, so that the emotion feature image is rendered in the played image frame according to the style data and the acquired display position. In one embodiment, the emotional feature image is a dynamic image that includes a sequence of image frames. The terminal may render the image frames included in the dynamic image one by one according to the frame rate and the display position corresponding to the dynamic image.

In one embodiment, the placement location may be the location of the emotional feature image relative to a particular region of the currently played image frame; the terminal may track the particular region in the played image frame, thereby following the placement and tracking The specific area determines the position of the emotional feature image in the currently played image frame relative to the tracked specific area, thereby rendering the emotional feature image according to the determined position. The specific area is a specific area in the image that can represent a real scene, and the specific area may be a face area or the like.

The user-generated content processing method plays an image frame reflecting a real scene, so that the played image frame can reflect the real scene. Obtaining the facial emotion feature recognition result obtained by identifying the face image included in the image frame can automatically determine the emotional state of the person in the real scene. After obtaining the display position of the emotional feature image in the currently played image frame, according to the display position, rendering the emotional feature image in the currently played image frame, the virtual emotional feature image can be automatically compared with the character in the real scene. Combine to reflect the emotional state of the characters in the real scene. The image processing efficiency is greatly improved by avoiding the cumbersome steps of manual operation.

In an embodiment, step S1202 specifically includes: adjusting a size of the image frame to a preset size; rotating the direction of the adjusted image frame to a direction conforming to the emotional feature recognition condition; transmitting the rotated image frame to the server; receiving the server The returned face emotion feature recognition result for the transmitted image frame.

The preset size refers to the size of a preset image frame. The direction that conforms to the emotional feature recognition condition refers to the direction in which the image frame can perform emotional feature recognition.

In one embodiment, the terminal may pull a preset image feature of the image frame including the face image from the server, the image feature being a feature that the image frame that can perform the face recognition may have. For example, the size of an image frame or the direction of an image frame, and the like.

Specifically, after acquiring the image frame collected from the real scene and selecting the image frame including the face image, the terminal may detect whether the size of the selected image frame including the face image conforms to the preset size. If the size of the image frame including the face image selected and detected does not conform to the preset size, the image frame is resized.

The terminal may detect the current direction of the image frame after detecting that the size of the filtered image frame including the face image conforms to a preset size or after resizing the non-compliant image frame. If the current direction of the image frame does not conform to the emotional feature recognition condition, the direction of the image frame is rotated to a direction that conforms to the emotional feature recognition condition.

The terminal may transmit the image frame to the server after the current direction of the image frame conforms to the emotional feature recognition condition or after the direction of the non-conforming image frame is rotated. After receiving the image frame sent by the terminal, the server extracts the expression feature data included in the image frame, and performs facial expression recognition on the face image included in the received image frame according to the extracted expression feature data to obtain facial emotion feature recognition. As a result, the recognized facial emotion feature recognition result is fed back to the terminal.

In an embodiment, after acquiring an image frame acquired from a real scene and picking out an image frame including a face image, the terminal may perform image reduction processing on the image frame, and save the reduced image frame as a JPEG. (Joint Photographic Experts Group). The terminal may re-detect the direction of the face image included in the image frame, and rotate the direction of the image frame when the direction of the face image included in the image frame does not conform to the direction of the emotion feature recognition condition.

Among them, the JPEG format refers to an image format compressed according to the international image compression standard. The direction conforming to the emotional feature recognition condition may specifically be a direction when the angle between the central axis of the face image and the vertical direction in the image frame is not more than 45 degrees.

In the above embodiment, before the expression recognition of the face image in the image frame by the server, the size and direction of the image frame are adjusted, so that the image frame conforms to the condition for performing expression recognition, which can improve the speed and accuracy of the expression recognition, and can also reduce Hardware resource consumption.

In an embodiment, the image processing method further includes: extracting voice data recorded when the image frame is acquired; and acquiring a voice emotion feature recognition result obtained by identifying the voice data. This step can be specifically performed before S1204. Step S1204 specifically includes: searching for a corresponding emotional feature image according to the facial emotion feature recognition result and the voice emotion feature recognition result.

Specifically, when the terminal collects an image frame from a real scene, the terminal can simultaneously record the voice data in the real scene, and synchronously play the recorded voice data when the captured image frame is played. The terminal specifically can call the sound collection device to collect the voice data formed by the ambient sound, and store the voice data in the buffer area corresponding to the collection time.

When performing facial expression recognition on the face image included in the collected image frame, the terminal may extract the acquisition time corresponding to the image frame currently performing the expression recognition, and intercept the voice data segment of the preset time length from the voice data of the buffer area. And the collected time interval corresponding to the extracted voice data segment covers the acquired acquisition time. The extracted voice data segment is the voice data recorded when the image frame is acquired. The preset time length is a preset length of the intercepted voice data segment, and the preset time length may be 5 seconds or 10 seconds.

In one embodiment, the terminal may intercept the voice data segment of the preset time length from the voice data of the buffer area with the acquired acquisition time as a midpoint. For example, the acquisition time of the image frame currently performing expression recognition is 18:30:15 on October 1, 2016, and the preset time is 5 seconds, then it can be 18:30:15 on October 1, 2016. For the midpoint, intercept the speech data segment from 18:30:13 on October 1, 2016 to 18:30:17 on October 1, 2016.

In an embodiment, when receiving the image frame collected by the other terminal from the real scene, the terminal may also receive the voice data recorded by the other terminal when the image frame is acquired. The terminal can store the received voice data into the buffer area, and when the image frame is played according to the collected timing, the voice data is taken out and played synchronously.

The terminal may extract the face time corresponding to the image frame currently performing the face recognition when performing the face recognition on the face image included in the received image frame, and intercept the voice data segment of the preset time length from the voice data of the buffer area. And the collected time interval corresponding to the extracted voice data segment covers the acquired acquisition time. The extracted voice data segment is the voice data recorded when the image frame is acquired.

After acquiring the voice data recorded when the image frame currently performing the expression recognition is acquired, the terminal identifies the obtained voice data to obtain a voice emotion feature recognition result.

In an embodiment, the step of acquiring the speech emotion feature recognition result obtained by recognizing the speech data in the image processing method comprises: recognizing the extracted speech data as text; searching for an emotional feature keyword included in the text; according to the found emotion The feature keyword acquires a speech emotion feature recognition result corresponding to the voice data.

Specifically, the terminal may perform feature extraction on the voice data, obtain the voice feature data to be identified, and then perform voice segmentation processing on the voice feature data to be recognized based on the acoustic model to obtain a plurality of phonemes, according to the correspondence between the candidate words and the phonemes in the candidate font library. The relationship converts the processed plurality of phonemes into a sequence of characters, and then uses the language model to adjust the transformed character sequence to obtain text conforming to the natural language mode.

Among them, the text is a character representation of the voice data. Acoustic models such as GMM (Gaussian Mixture Model) or DNN (Deep Neural Network). The candidate font includes a candidate word and a phoneme corresponding to the candidate word. The language model is used to adjust the sequence of characters recognized by the acoustic model according to the natural language mode, such as the N-Gram model (CLM, Chinese Language Model).

The terminal may set an emotional feature keyword library in advance, and the emotional feature keyword library includes a plurality of emotional feature keywords, and maps the emotional feature keywords in the emotional feature keyword library that reflect the same emotional features to the same emotional features. Emotional feature keyword libraries can be stored in files, databases, or caches, and retrieved from files, databases, or caches when needed. After identifying the extracted voice data as text, the terminal compares the characters included in the recognized text with the emotion feature keywords in the emotional feature keyword library. When the character in the text matches the emotional feature keyword in the emotional feature keyword library, the matched emotional feature keyword is obtained, and the emotional feature corresponding to the emotional feature keyword is obtained as the speech emotional feature recognition result.

For example, suppose that the text obtained by the terminal to recognize the voice data is “I am very happy today”, including the emotional feature keyword “happy”, and the emotional feature mapped to “happy” is “happy”, then the voice emotional feature recognition result is “happy”. ". Assume that the text obtained by the terminal recognizing the voice data is "I am very happy", including the emotional feature keyword "happy", and the emotional feature mapped to "happy" is "happy", then the speech emotional feature recognition result is also "happy".

In the above embodiment, by performing text recognition on the recorded speech data, the speech emotion feature recognition result is obtained according to the character representing the emotional feature included in the text, and the accuracy of the speech emotion feature recognition result is improved.

In one embodiment, the terminal may also recognize the result of the speech emotion feature obtained from the acoustic features corresponding to the voice data. The terminal specifically extracts the acoustic features of the speech data, and obtains the corresponding emotional features according to the correspondence between the acoustic features and the emotional features established in advance, and obtains the speech emotional feature recognition result.

In one embodiment, the acoustic features include timbre and prosodic features. The timbre refers to the sound of the sounding body. Different sounding bodies have different sounds due to different materials and structures. The tone is characterized physics by spectral parameters. The prosodic feature refers to the basic pitch and rhythm of the sound emitted by the sound body, and the rhythm feature is characterized by the fundamental frequency parameter, the duration distribution and the signal intensity.

For example, when people feel happy, the rhythm will be cheerful when speaking. If the base has higher pitch and faster rhythm in the prosody features extracted from the voice data, it can indicate that the emotional characteristics reflected by the voice data are Happy.

In this embodiment, by performing acoustic feature extraction on the recorded speech data, the speech emotion feature recognition result is obtained according to the parameter representing the emotional feature in the acoustic feature, and the accuracy of the speech emotion feature recognition result is improved.

In an embodiment, the step of searching for the corresponding emotion feature image according to the facial emotion feature recognition result and the voice emotion feature recognition result in the image processing method may specifically include: when the face emotion feature recognition result matches the voice emotion feature recognition result At the time, the corresponding emotional feature image is searched according to the facial emotion feature recognition result.

Specifically, the terminal acquires the face emotion feature recognition result obtained according to the face expression of the face image included in the image frame, and the voice emotion feature recognition result obtained according to the voice data recorded when the image frame is collected, and then the face is The emotion feature recognition result is compared with the speech emotion feature recognition result. When the face emotion feature recognition result matches the phonetic emotion feature recognition result, the corresponding emotion feature image is searched according to the face emotion feature recognition result.

In an embodiment, the image processing method searches for the corresponding emotional feature image according to the facial emotion feature recognition result, including: extracting the emotional feature type and the recognition result confidence included in the facial emotional feature recognition result; and searching and corresponding to the emotional feature type a set of emotional feature images; from the set of emotional feature images, an emotional feature image corresponding to the confidence of the recognition result is selected.

Among them, the emotional feature type refers to the type of emotional features reflected by the face. Such as "happy", "sad" or "anger". The confidence of the recognition result indicates that the facial emotion feature recognition result is the credibility of the real emotional feature of the face, and the higher the confidence of the recognition result, the higher the possibility that the face emotion feature recognition result is the real emotional feature of the face.

Specifically, the emotional feature image library established in advance by the terminal may include a plurality of emotional feature image sets, and each of the emotional feature image sets reflects an emotional feature type. The terminal may map one emotional feature image one by one according to the confidence level of the facial emotion feature recognition result. After obtaining the facial emotion feature recognition result, the terminal searches for the emotional feature image set corresponding to the emotional feature type included in the emotional feature image database and the facial feature feature recognition result, from the found emotional feature image set. An emotion feature image corresponding to the confidence of the recognition result included in the face emotion feature recognition result is selected.

In the above embodiment, the corresponding emotion feature images are respectively set for the confidence of the recognition results included in the different facial emotion feature recognition results, and the credibility of the face emotion feature recognition results is visually reflected by the emotion feature images, so that the image processing is performed. The result is more accurate.

In an embodiment, when the facial emotion feature recognition result matches the speech emotion feature recognition result, the terminal may also reflect the emotional feature and the facial emotional feature recognition result included in the found emotional feature image database. In the consistent emotional feature image set, an emotional feature image is randomly selected.

In the embodiment, when the facial emotion feature recognition result matches the voice emotion feature recognition result, the corresponding emotion feature image is searched according to the face emotion feature recognition result, so that the face is protected according to the voice emotion feature recognition result. The emotion feature recognition result is used for image processing, so that the image processing result is more accurate.

In an embodiment, the step of searching for the corresponding emotion feature image according to the face emotion feature recognition result and the voice emotion feature recognition result in the image processing method may specifically include: when the face emotion feature recognition result and the voice emotion feature recognition result are not When matching, the corresponding emotional feature image is searched according to the speech emotion feature recognition result.

Specifically, the terminal acquires the face emotion feature recognition result obtained according to the face expression of the face image included in the image frame, and the voice emotion feature recognition result obtained according to the voice data recorded when the image frame is collected, and then the face is The emotional feature recognition result is compared with the speech emotion feature recognition result. When the facial emotion feature recognition result does not match the speech emotion feature recognition result, the corresponding emotional feature image is searched according to the speech emotion feature recognition result.

In one embodiment, the terminal may also acquire an adverb of degree included in the text identified by the voice data. Degree adverbs are used to indicate the intensity of emotions, such as "very", "very" or "and". The speech emotion feature recognition result obtained by the terminal for the voice data identification may specifically include the emotion feature type and the emotion intensity degree.

Specifically, the emotional feature image library established in advance by the terminal may include a plurality of emotional feature image sets, and each of the emotional feature image sets reflects an emotional feature type. The terminal may map an emotional feature image one by one corresponding to the intensity of the emotion. After obtaining the speech emotion feature recognition result, the terminal searches for the emotional feature image set that is reflected in the emotional feature image database and the emotional feature type included in the speech emotional feature recognition result, and selects from the found emotional feature image set. The speech emotion feature recognition result includes an emotional feature image corresponding to the emotional intensity.

In this embodiment, when the facial emotion feature recognition result does not match the voice emotion feature recognition result, the corresponding emotion feature image is searched according to the voice emotion feature recognition result, and the emotion feature recognition result expressed by the real voice data is Image processing is performed to make the image processing result more accurate.

In the above embodiment, the facial emotion feature recognition result and the speech emotion feature recognition result are comprehensively considered, and the emotional feature image reflecting the emotional feature expressed in the image frame is searched for, so that the image processing result is more accurate.

In an embodiment, step S1206 specifically includes determining a display position of the face image in the currently played image frame; querying a relative position of the emotion feature image and the face image; and determining the emotion feature image according to the display position and the relative position The placement in the image frame that is played.

In this embodiment, the presentation position of the emotional feature image in the currently played image frame refers to the physical location where the emotional feature image is displayed in the currently played image frame. The terminal may obtain a reference object when the found emotional feature image is drawn when searching for the emotional feature image. The reference object may specifically be a face image included in the image frame.

Specifically, the terminal can obtain the display position of the reference object in the currently played image frame and the relative position of the emotional feature image and the reference object, and the terminal displays the position and the emotional feature image and the reference according to the reference object in the currently played image frame. The relative position of the object determines the position of the emotional feature image in the currently played image frame. The display position of the emotion feature image in the currently played image frame may specifically be a pixel coordinate interval or a coordinate interval of other preset positioning modes. A pixel is the smallest unit that can be displayed on a computer screen. In this embodiment, the pixels may be logical pixels or physical pixels.

In the above embodiment, by setting the relative positions of the emotion feature image and the face image, the position of the emotion feature image relative to the face image is displayed, thereby making the display position of the emotion feature image more reasonable.

In one embodiment, the image processing method further includes tracking a motion trajectory of the face image in the played image frame; and moving the emotion feature image to follow the face image in the played image frame according to the tracked motion trajectory. These steps can be specifically performed after S1208.

The motion track of the face image refers to a track formed by the face image included in the continuously played image frame. Specifically, the display position of the emotion feature image may be a position of the emotion feature image relative to the face image in the currently played image frame; the terminal may track the face image in the currently played image frame in the played image frame, thereby According to the display position and the tracked face image, the position of the emotional feature image in the currently played image frame relative to the tracked face image is determined, thereby rendering the emotional feature image according to the determined position.

In the above embodiment, the emotion feature image is displayed following the face image, thereby intelligently connecting the emotion feature image with the face in the real scene to provide a new interaction mode.

As shown in FIG. 13, in a specific embodiment, the user generated content processing method includes:

S1302, acquiring image frames from the real world.

S1303: The collected image frames are played frame by frame according to the collected timing.

S1304: Select an image frame from the collected image frames.

S1305: Determine whether a face image is included in the selected image frame; if yes, go to step S1306; if no, go to step S1314.

S1306: Adjust the size of the image frame to a preset size; rotate the direction of the adjusted image frame to a direction that conforms to the emotional feature recognition condition; send the rotated image frame to the server; and receive the facial emotion feature recognition result returned by the server.

S1307: extract voice data recorded when the image frame is acquired; and obtain a voice emotion feature recognition result obtained by identifying the voice data.

S1308: determining whether the facial emotion feature recognition result matches the voice emotion feature recognition result; if yes, the process goes to step S1309; if not, the process goes to step S1310.

S1309, extracting the emotion feature type and the recognition result confidence degree included in the facial emotion feature recognition result; searching for the emotion feature image set corresponding to the emotion feature type; and extracting the emotion corresponding to the recognition result confidence degree from the emotion feature image set Feature image.

S1310: Search for a corresponding emotional feature image according to the speech emotion feature recognition result.

S1311: determining a display position of the face image in the currently played image frame; querying a relative position of the emotional feature image and the face image; and determining, according to the display position and the relative position, the display position of the emotional feature image in the currently played image frame .

S1312: Render an emotional feature image in the currently played image frame according to the display position.

S1313: Tracking a motion trajectory of the face image in the played image frame; and moving the emotion feature image to follow the face image in the played image frame according to the tracked motion trajectory.

S1314: Acquire user generated content associated with the template image that matches the selected image frame.

S1315: Acquire a display position of the user-generated content in the matched template image.

S1316: Render user generated content in the played image frame according to the placement position.

In this embodiment, the image frame is acquired from the real scene and played according to the collected timing. When the image frame includes the face image, the face emotion feature recognition result of the face image included in the collected image frame is It is possible to determine and display the emotional feature image reflecting the emotional characteristics of the person in the face image. In this way, the emotional feature image is displayed immediately according to the image frame collected in the real scene, which avoids the manual introduction of the emotional feature image and manually adjusts the emotional feature image for display, thereby improving the image processing efficiency and real-time image processing. Strong.

Moreover, when the face image is not included in the image frame, the user generated content associated with the template image matched by the image frame is determined and displayed. The ability to locate and display user-generated content through image frames captured in the real world can extend the way user-generated content is propagated without having to rely on social relationships. Moreover, according to the display position of the user-generated content in the matched template image, the user-generated content is tracked and rendered in the played image frame, and the user-generated content in the virtual world is merged with the real world reflected by the played video frame, and A new way of interacting with user-generated content.

In an embodiment, after the terminal recognizes the voice data, the terminal may also display the recognized text in the currently played image frame. The terminal specifically draws a component for displaying the text content in the currently played image frame, and displays the recognized text in the component. In this embodiment, by displaying the recognized text in the currently played image frame, the obstacle of the interaction between the deaf and the human being can be overcome, and the practicability of the image processing is improved.

It should be understood that the various steps in the various embodiments of the present application are not necessarily performed in the order indicated by the steps. Except as explicitly stated herein, the execution of these steps is not strictly limited, and the steps may be performed in other orders. Moreover, at least some of the steps in the embodiments may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be executed at different times, and the execution of these sub-steps or stages The order is also not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of the other steps.

Figure 14 is a diagram showing a comparison of the front and rear interfaces of an emotional feature image in one embodiment. FIG. 14 is a schematic diagram of an interface before the emotional feature image is drawn. The interface diagram includes a face image 1410, and an interface diagram of the emotional feature image is drawn right after referring to FIG. 14 . The interface schematic includes a face image 1410 and an emotional feature image 1420. The emotion feature image 1420 includes an emotion feature image 1421 indicating that the emotion feature is happy and an emotion feature image 1422 indicating that the emotion feature is sad.

The terminal obtains the corresponding facial feature image by the facial emotion feature recognition result obtained by performing facial expression recognition on the face image 1410 in the interface before the emotional feature image is drawn, and the speech emotion feature recognition result obtained by the recorded voice data recognition. Rear. If the emotional feature reflected by the left image of FIG. 14 including the face image 1410 is happy, the face image 1410 is tracked in the currently played image frame, and the emotional feature image 1421 indicating that the emotional feature is happy is drawn at the corresponding position. . If the emotion feature reflected by the left image of FIG. 14 including the face image 1410 is sad, the face image 1410 is tracked in the currently played image frame, and the emotional feature image 1422 indicating that the emotional feature is sad is drawn at the corresponding position. .

Fig. 15 is a view showing a comparison of the front and rear interfaces of the text obtained by the recognition of the voice data in one embodiment. Referring to FIG. 15 is a schematic diagram showing an interface before the text is obtained according to the voice data. The interface diagram includes a face image 1510. Referring to FIG. 15 is a schematic diagram of an interface after displaying the text according to the voice data. The interface diagram includes a person. Face image 1510, emotion feature image 1520, and text 1530. The text 1530 is recognized by the terminal according to the voice data recorded when the image frame is collected, and may be “I am so sad today”, and the emotional feature reflected is sad, and the face image may be tracked in the currently played image frame. 1510, and the recognized text 1530 is displayed at the corresponding position, and the emotional feature image 1520 indicating that the emotional feature is sad may also be drawn at the corresponding position.

Figure 16 is a block diagram showing the structure of the terminal 1600 in one embodiment. The internal structure of the terminal 1600 can be referred to the structure shown in FIG. 2. Each of the modules described below can be implemented in whole or in part by software, hardware, or a combination thereof. Referring to FIG. 16, the terminal 1600 includes an acquisition module 1601, a play module 1602, a selection module 1603, a data acquisition module 1604, and a rendering module 1605.

The acquisition module 1601 is configured to collect image frames from the real world.

The playing module 1602 is configured to play the collected image frames frame by frame according to the collected timing.

A selection module 1603 is configured to select an image frame from the acquired image frames.

The data obtaining module 1604 is configured to acquire user generated content associated with the template image that matches the selected image frame, and obtain a display position of the user generated content in the matched template image.

The rendering module 1605 is configured to render user generated content in the played image frame according to the placement position.

In an embodiment, the selecting module 1603 is further configured to determine whether the feature of the selected image frame meets the preset template image feature; and when the feature of the selected image frame meets the template image feature, notifying the acquiring module to enable the acquiring module to work; When the features of the selected image frame do not conform to the template image features, the image frames are continuously selected from the acquired image frames.

In an embodiment, the data obtaining module 1604 is further configured to: upload the selected image frame to the server; receive, by the server, a first notification that is sent to the template image that matches the uploaded image frame; and obtain the User-generated content associated with the template image.

In an embodiment, the data obtaining module 1604 is further configured to: upload the selected image frame to the server; receive, by the server, a second notification that is not queried to the template image that matches the uploaded image frame; and display the content according to the second notification Creating an entry; creating user generated content according to an operation of creating an entry for the content; uploading the created user generated content to the server, so that the server stores the uploaded user generated content in association with the template image registered by the uploaded image frame.

In one embodiment, the data acquisition module 1604 is further configured to acquire a stereo rotation parameter configured when the user generated content is created. The rendering module 1605 is further configured to render the user-generated content rotated according to the stereo rotation parameter in the played image frame according to the display position.

In an embodiment, the rendering module 1605 is further configured to track an object region in the template image in the played image frame; detect a morphological change of the tracked object region relative to the object region in the template image; determine the observation according to the morphological change The parameter of the direction; according to the display position, the user generated content deformed according to the parameter indicating the viewing direction is rendered in the played image frame.

In one embodiment, the data acquisition module 1604 is further configured to acquire a plurality of content creator information and corresponding user generated content associated with the template image that matches the selected image frame. The rendering module 1605 is further configured to display a plurality of content creator information; select one of the plurality of content creator information; render the corresponding user in the played image frame according to the selected one of the content creator information corresponding to the displayed position Generate content.

The terminal 1600 collects image frames from the real world and plays according to the collected timing. By selecting the image frames from the collected image frames, the user-generated content associated with the template image matched by the image frames can be determined and displayed. . The ability to locate and display user-generated content through image frames captured in the real world can extend the way user-generated content is propagated without having to rely on social relationships. Moreover, according to the display position of the user-generated content in the matched template image, the user-generated content is tracked and rendered in the played image frame, and the user-generated content in the virtual world is merged with the real world reflected by the played video frame, and A new way of interacting with user-generated content.

As shown in FIG. 17, in an embodiment, the terminal 1600 further includes: a recognition result obtaining module 1703, a searching module 1704, and a placement acquiring module 1705.

The recognition result obtaining module 1703 is configured to obtain a facial emotion feature recognition result obtained by identifying a face image included in the image frame when the face image is included in the selected image frame.

The searching module 1704 is configured to search for a corresponding emotional feature image according to the facial emotion feature recognition result.

The display location obtaining module 1705 is configured to acquire a display position of the emotional feature image in the currently played image frame.

The rendering module 1605 is further configured to render the emotional feature image in the currently played image frame according to the placement position.

The terminal 1600 plays the image frame reflecting the real scene, so that the played image frame can reflect the real scene. Obtaining the facial emotion feature recognition result obtained by identifying the face image included in the image frame can automatically determine the emotional state of the person in the real scene. After obtaining the display position of the emotional feature image in the currently played image frame, according to the display position, rendering the emotional feature image in the currently played image frame, the virtual emotional feature image can be automatically compared with the character in the real scene. Combine to reflect the emotional state of the characters in the real scene. The image processing efficiency is greatly improved by avoiding the cumbersome steps of manual operation.

In one embodiment, the recognition result obtaining module 1703 is further configured to adjust the size of the image frame to a preset size; rotate the direction of the adjusted image frame to a direction that conforms to the emotional feature recognition condition; and send the rotated image frame to the server. Receiving a face emotion feature recognition result returned by the server for the transmitted image frame.

In this embodiment, before the expression recognition of the face image in the image frame by the server, the size and direction of the image frame are adjusted, so that the image frame conforms to the condition for performing the expression recognition, and the speed and accuracy of the expression recognition can be improved. Reduce hardware resource consumption.

In one embodiment, the recognition result obtaining module 1703 is further configured to extract voice data recorded when the image frame is acquired, and obtain a voice emotion feature recognition result obtained by identifying the voice data. The searching module 1704 is further configured to search for a corresponding emotional feature image according to the facial emotion feature recognition result and the voice emotion feature recognition result.

In this embodiment, the facial emotion feature recognition result and the speech emotion feature recognition result are comprehensively considered, and the emotion feature image reflecting the emotional feature expressed in the image frame is searched for, so that the image processing result is more accurate.

In an embodiment, the recognition result obtaining module 1703 is further configured to identify the extracted voice data as text; search for an emotional feature keyword included in the text; and acquire a voice corresponding to the voice data according to the found emotional feature keyword Emotional feature recognition results.

In this embodiment, by performing text recognition on the recorded voice data, the voice emotion feature recognition result is obtained according to the character representing the emotional feature included in the text, and the accuracy of the voice emotion feature recognition result is improved.

In an embodiment, the searching module 1704 is further configured to search for a corresponding emotional feature image according to the facial emotion feature recognition result when the facial emotion feature recognition result matches the voice emotion feature recognition result.

In this embodiment, the corresponding emotion feature images are respectively set for the confidence of the recognition result included in the different facial emotion feature recognition results, and the credibility of the face emotion feature recognition result is visually reflected by the emotion feature image, so that the image is made The processing results are more accurate.

In an embodiment, the searching module 1704 is further configured to extract the sentiment feature type and the recognition result confidence included in the facial emotion feature recognition result; search for the emotional feature image set corresponding to the emotional feature type; and select from the emotional feature image set An emotional feature image corresponding to the confidence of the recognition result.

In an embodiment, the searching module 1704 is further configured to: when the facial emotion feature recognition result does not match the voice emotion feature recognition result, search for the corresponding emotional feature image according to the voice sentiment feature recognition result.

In one embodiment, the location acquiring module 1705 is further configured to determine a display position of the face image in the currently played image frame; query a relative position of the emotion feature image and the face image; determine the emotion according to the display position and the relative position The placement of the feature image in the currently played image frame.

In the embodiment, by setting the relative positions of the emotion feature image and the face image, the position of the emotion feature image relative to the face image is displayed, thereby making the display position of the emotion feature image more reasonable.

As shown in FIG. 18, in one embodiment, terminal 1600 also includes a render following module 1707.

The rendering follower module 1707 is configured to track the motion trajectory of the face image in the played image frame; and move the emotion feature image to follow the face image in the played image frame according to the tracked motion trajectory.

In this embodiment, the emotional feature image is displayed following the face image, thereby intelligently associating the emotional feature image with the face in the real scene to provide a new interactive mode.

One of ordinary skill in the art can understand that all or part of the process of implementing the above embodiments can be completed by a computer program to instruct related hardware, and the program can be stored in a non-volatile computer readable storage medium. Wherein, the program, when executed, may include the flow of an embodiment of the methods as described above. Any reference to a memory, storage, database or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of formats, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronization chain. Synchlink DRAM (SLDRAM), Memory Bus (Rambus) Direct RAM (RDRAM), Direct Memory Bus Dynamic RAM (DRDRAM), and Memory Bus Dynamic RAM (RDRAM).

The technical features of the above embodiments may be arbitrarily combined. For the sake of brevity of description, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, It is considered to be the range described in this specification.

The above embodiments are merely illustrative of several embodiments of the present application, and the description thereof is more specific and detailed, but is not to be construed as limiting the scope of the invention. It should be noted that a number of variations and modifications may be made by those skilled in the art without departing from the spirit and scope of the present application. Therefore, the scope of the invention should be determined by the appended claims.

Claims

A user generated content processing method, including:

The terminal collects image frames from the real world;

The terminal plays the captured image frames frame by frame according to the collected timing;

The terminal selects an image frame from the collected image frames;

The terminal acquires user generated content associated with a template image that matches the selected image frame;

Obtaining, by the terminal, a display position of the user generated content in the matched template image; and

The terminal renders the user generated content in a played image frame according to the display position.
The method of claim 1 further comprising:

Determining, by the terminal, whether the feature of the selected image frame meets a preset template image feature;

When the feature of the selected image frame conforms to the template image feature, the terminal performs a step of acquiring user generated content associated with the template image that matches the selected image frame;

When the feature of the selected image frame does not conform to the template image feature, returning to the step of selecting an image frame from the acquired image frame.
The method according to claim 2, wherein the determining, by the terminal, whether the feature of the selected image frame conforms to the preset template image feature comprises:

The terminal extracts feature points of the selected image frame, and determines whether the number of extracted feature points reaches a preset template image feature point threshold; and/or,

Obtaining, by the terminal, a resolution of the selected image frame, determining whether the resolution reaches a preset template image resolution threshold; and/or,

The terminal acquires the definition of the selected image frame, and determines whether the resolution reaches a preset template image clarity threshold; and/or,

The terminal acquires the proportion of the object area in the selected image frame to the selected image frame, and determines whether the ratio reaches a preset template image object ratio.
The method according to claim 1, wherein the terminal acquires user-generated content associated with the template image that matches the selected image frame, including:

The terminal uploads the selected image frame to the server;

Receiving, by the terminal, a first notification that is sent by the server to query a template image that matches an uploaded image frame; and

The terminal acquires user generated content associated with the template image according to the first notification.
The method according to claim 1, wherein the acquiring the user-generated content associated with the template image that matches the selected image frame comprises:

The terminal uploads the selected image frame to the server;

Receiving, by the terminal, a second notification that is sent by the server, indicating that the template image that matches the uploaded image frame is not queried;

The terminal creates an entry according to the second notification display content;

The terminal creates user generated content according to an operation of creating an entry for the content; and

The terminal uploads the created user generated content to the server, so that the server associates the uploaded user generated content with a template image registered by the uploaded image frame.
The method of claim 1 further comprising:

The terminal acquires a stereo rotation parameter configured when the user generated content is created;

And the rendering, by the terminal, the user-generated content in the played image frame according to the display location, including:

And the terminal, according to the display position, rendering the user generated content rotated according to the stereo rotation parameter in a played image frame.
The method according to claim 1, wherein the terminal renders the user-generated content in a played image frame according to the display position, including:

The terminal tracks an object area in the template image in a played image frame;

The terminal determines a tracking rendering position according to the display position and the tracked object area; and the terminal renders the user generated content according to the tracking rendering position in the played image frame.
The method according to claim 1, wherein the terminal acquires user-generated content associated with the template image that matches the selected image frame, including:

The terminal acquires a plurality of content creator information and corresponding user generated content associated with the template image that matches the selected image frame;

And the rendering, by the terminal, the user-generated content in the played image frame according to the display location, including:

The terminal displays the plurality of content creator information;

The terminal selects one of the plurality of content creator information; and

The terminal renders the corresponding user generated content in the played image frame according to the selected corresponding content of the content creator information.
The method of claim 1 further comprising:

When the selected image frame includes a face image, the terminal acquires a face emotion feature recognition result obtained by identifying a face image included in the image frame;

The terminal searches for a corresponding emotional feature image according to the facial emotion feature recognition result;

Obtaining, by the terminal, a display position of the emotion feature image in an currently played image frame; and

The terminal renders the emotion feature image in the currently played image frame according to the display position.
The method according to claim 9, wherein the obtaining, by the terminal, a facial emotion feature recognition result obtained by identifying a face image included in the image frame comprises:

The terminal adjusts a size of the image frame to a preset size;

Transmitting, by the terminal, the direction of the adjusted image frame to a direction that meets an emotional feature recognition condition;

Transmitting, by the terminal, the rotated image frame to a server; and

The terminal receives a facial sentiment feature recognition result returned by the server for the transmitted image frame.
The method of claim 9 further comprising:

The terminal extracts voice data recorded when the image frame is collected; and

The terminal acquires a speech emotion feature recognition result obtained by identifying the voice data;

The terminal searches for a corresponding emotional feature image according to the facial emotion feature recognition result, including:

The terminal searches for a corresponding emotional feature image according to the facial emotion feature recognition result and the voice emotion feature recognition result.
The method according to claim 11, wherein the obtaining, by the terminal, the speech emotion feature recognition result obtained by identifying the voice data comprises:

The terminal identifies the extracted voice data as text;

The terminal searches for an emotional feature keyword included in the text; and

The terminal acquires a speech emotion feature recognition result corresponding to the voice data according to the found emotional feature keyword.
The method according to claim 11, wherein the terminal searches for a corresponding emotional feature image according to the facial emotion feature recognition result and the voice emotion feature recognition result, including:

When the facial emotion feature recognition result matches the voice emotion feature recognition result, the terminal searches for the corresponding emotion feature image according to the facial emotion feature recognition result.
The method according to claim 13, wherein the terminal searches for a corresponding emotional feature image according to the facial emotion feature recognition result, including:

The terminal extracts an emotional feature type and a recognition result confidence degree included in the facial emotion feature recognition result;

The terminal searches for an emotional feature image set corresponding to the emotional feature type; and

The terminal selects an emotion feature image corresponding to the confidence of the recognition result from the emotion feature image set.
The method according to claim 11, wherein the terminal searches for a corresponding emotional feature image according to the facial emotion feature recognition result and the voice emotion feature recognition result, including:

When the facial emotion feature recognition result does not match the voice emotion feature recognition result, the terminal searches for the corresponding emotion feature image according to the voice emotion feature recognition result.
The method according to claim 9, wherein the terminal acquires a display position of the emotion feature image in the currently played image frame, including:

Determining, by the terminal, a display position of the face image in an currently played image frame;

The terminal queries a relative position of the emotional feature image and the face image; and

And determining, by the terminal, a display position of the emotion feature image in the currently played image frame according to the display position and the relative position.
The method of claim 16 further comprising:

The terminal tracks the motion track of the face image in the played image frame; and

The terminal moves the emotion feature image following the face image in the played image frame according to the tracked motion trajectory.
One or more non-volatile storage media storing computer readable instructions, when executed by one or more processors, cause one or more processors to perform the following steps:

Acquire image frames from the real world;

The captured image frames are played frame by frame according to the collected timing;

Selecting an image frame from the acquired image frames;

Acquiring user generated content associated with the template image that matches the selected image frame;

Obtaining a display position of the user generated content in the matched template image; and

The user generated content is rendered in the played image frame in accordance with the placement.
The storage medium of claim 18, wherein the computer readable instructions further cause one or more processors to perform the following steps:

Determining whether the feature of the selected image frame conforms to a preset template image feature;

And performing, when the feature of the selected image frame conforms to the template image feature, performing the step of acquiring user generated content associated with the template image that matches the selected image frame; and

When the feature of the selected image frame does not conform to the template image feature, returning to the step of selecting an image frame from the acquired image frame.
The storage medium according to claim 18, wherein the acquiring the user-generated content associated with the template image that matches the selected image frame comprises:

Obtaining a stereo rotation parameter configured when the user generated content is created;

And rendering the user generated content in the played image frame according to the display position, including:

According to the display position, the user-generated content rotated according to the stereo rotation parameter is rendered in a played image frame.
The storage medium according to claim 18, wherein the rendering the user-generated content in the played image frame according to the display location comprises:

Tracking an object area in the template image in a played image frame;

Determining a tracking rendering position based on the display location and the tracked object area; and

The user generated content is rendered in the played image frame in accordance with the tracked rendering position.
The storage medium of claim 18, wherein the computer readable instructions further cause one or more processors to perform the following steps:

Obtaining a facial emotion feature recognition result obtained by recognizing a face image included in the image frame when the selected image frame includes a face image;

Finding a corresponding emotional feature image according to the facial emotion feature recognition result;

Obtaining a display position of the emotional feature image in a currently played image frame; and

The emotional feature image is rendered in the currently played image frame in accordance with the placement.
The storage medium of claim 22, wherein the computer readable instructions further cause one or more processors to perform the following steps:

Extracting voice data recorded when the image frame is acquired; and

Obtaining a speech emotion feature recognition result obtained by identifying the voice data;

And searching for the corresponding emotional feature image according to the facial emotion feature recognition result, including:

And searching for a corresponding emotional feature image according to the facial emotion feature recognition result and the voice emotion feature recognition result.
The storage medium according to claim 22, wherein the obtaining the display position of the emotion feature image in the currently played image frame comprises:

Determining a display position of the face image in a currently played image frame;

Querying a relative position of the emotional feature image and the face image; and

And determining, according to the display position and the relative position, a display position of the emotion feature image in a currently played image frame.
A terminal comprising a memory and a processor, the memory storing computer readable instructions, the computer readable instructions being executed by the processor such that the processor performs the following steps:

Acquire image frames from the real world;

The captured image frames are played frame by frame according to the collected timing;

Selecting an image frame from the acquired image frames;

Acquiring user generated content associated with the template image that matches the selected image frame;

Obtaining a display position of the user generated content in the matched template image; and

The user generated content is rendered in the played image frame in accordance with the placement.
The terminal of claim 25, wherein the computer readable instructions further cause the processor to perform the following steps:

Determining whether the feature of the selected image frame conforms to a preset template image feature;

And performing, when the feature of the selected image frame conforms to the template image feature, performing the step of acquiring user generated content associated with the template image that matches the selected image frame; and

When the feature of the selected image frame does not conform to the template image feature, returning to the step of selecting an image frame from the acquired image frame.
The terminal according to claim 25, wherein the acquiring the user-generated content associated with the template image that matches the selected image frame comprises:

Obtaining a stereo rotation parameter configured when the user generated content is created;

And rendering the user generated content in the played image frame according to the display position, including:

According to the display position, the user-generated content rotated according to the stereo rotation parameter is rendered in a played image frame.
The terminal according to claim 25, wherein the rendering the user-generated content in the played image frame according to the display position comprises:

Tracking an object area in the template image in a played image frame;

Determining a tracking rendering position based on the display location and the tracked object area; and

The user generated content is rendered in the played image frame in accordance with the tracked rendering position.
The terminal of claim 25, wherein the computer readable instructions further cause the processor to perform the following steps:

Obtaining a facial emotion feature recognition result obtained by recognizing a face image included in the image frame when the selected image frame includes a face image;

Finding a corresponding emotional feature image according to the facial emotion feature recognition result;

Obtaining a display position of the emotional feature image in a currently played image frame; and

The emotional feature image is rendered in the currently played image frame in accordance with the placement.
The terminal of claim 29, wherein the computer readable instructions further cause the processor to perform the following steps:

Extracting voice data recorded when the image frame is acquired; and

Obtaining a speech emotion feature recognition result obtained by identifying the voice data;

And searching for the corresponding emotional feature image according to the facial emotion feature recognition result, including:

And searching for a corresponding emotional feature image according to the facial emotion feature recognition result and the voice emotion feature recognition result.
The terminal according to claim 29, wherein the obtaining the display position of the emotion feature image in the currently played image frame comprises:

Determining a display position of the face image in a currently played image frame;

Querying a relative position of the emotional feature image and the face image; and

And determining, according to the display position and the relative position, a display position of the emotion feature image in a currently played image frame.