CN111062207A

CN111062207A - Expression image processing method and device, computer storage medium and electronic equipment

Info

Publication number: CN111062207A
Application number: CN201911222705.7A
Authority: CN
Inventors: 苏汉; 张金超; 牛成
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-12-03
Filing date: 2019-12-03
Publication date: 2020-04-24
Anticipated expiration: 2039-12-03
Also published as: CN111062207B

Abstract

The disclosure provides an expression image processing method and device, a computer storage medium and electronic equipment, and relates to the field of artificial intelligence. The method comprises the following steps: responding to a first trigger operation of a user on a first expression image, and displaying a second expression image with the same emotion label as the first expression image in a chat interface; and responding to a second trigger operation of the user on the second expression image, and displaying the second expression image in a message sending area of the chat interface. According to the method and the device, the second expression image with the same emotion label is obtained by operating the first expression image, the efficiency and accuracy of obtaining the expression image are improved, user experience is further improved, information in the expression image can be extracted, the expression image is converted into text information, special people can understand the expression image conveniently, and communication quality and efficiency are improved.

Description

Expression image processing method and device, computer storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular, to an expression image processing method, an expression image processing apparatus, a computer storage medium, and an electronic device.

Background

With the rapid development of computer technology, people have more and more diversified communication modes, and besides making a call and sending a short message, the communication can be realized through various chat tools, for example, the communication of voice and characters can be realized through chat tools such as WeChat and QQ. Meanwhile, in order to increase the interest of chatting, more and more users begin to use emoji expressions and custom expression images, wherein the custom expression images are images made of materials such as popular stars, language records, cartoons and movie screenshots, and sometimes a series of matched characters can be matched to express specific emotions.

Generally, when a user-defined expression image is sent, an expression package needs to be downloaded to the local in advance, and then the expression image to be sent is selected from the downloaded expression package to be sent, or the user-defined expression image is obtained online to be sent, so that the operations are relatively time-consuming and labor-consuming; in addition, for the visually impaired people, if the screen reading program cannot identify the expression image, the visually impaired people cannot know the specific meaning of the expression image, so that the communication between the visually impaired people and other people is hindered, and the user experience of chatting the product is reduced.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The embodiment of the disclosure provides an expression image processing method, an expression image processing device, a computer storage medium and an electronic device, so that an expression image required by a user can be quickly and accurately acquired at least to a certain extent, and the communication quality and efficiency are improved.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

According to an aspect of an embodiment of the present disclosure, there is provided an expression image processing method including: responding to a first trigger operation of a user on a first expression image, and displaying a second expression image with the same emotion label as the first expression image in a chat interface; and responding to a second trigger operation of the user on the second expression image, and displaying the second expression image in a message sending area of the chat interface.

According to an aspect of an embodiment of the present disclosure, there is provided an expression image processing apparatus including: the emotion image acquisition module is used for responding to a first trigger operation of a user on a first emotion image and displaying a second emotion image which has the same emotion label as the first emotion image in a chat interface; and the expression image sending module is used for responding to a second trigger operation of the user on the second expression image and displaying the second expression image in a message sending area of the chat interface.

In some embodiments of the present disclosure, based on the foregoing scheme, the expression image obtaining module is configured to: responding to a first pressing operation of the user on the first expression image; comparing the duration of the first pressing operation with a preset duration; and when the duration of the first pressing operation is greater than or equal to the preset duration, displaying a second expression image with the same emotion label as the first expression image in the chat interface.

In some embodiments of the present disclosure, based on the foregoing solution, the expression image processing apparatus further includes: and the expression paraphrase information acquisition module is used for displaying the expression paraphrase information corresponding to the second expression image in the chat interface when the second expression image with the same emotion label as the first expression image is displayed in the chat interface.

In some embodiments of the present disclosure, based on the foregoing scheme, the expression image sending module is configured to: and responding to a second pressing operation of the user on the second expression image, and displaying the second expression image and expression paraphrase information corresponding to the second expression image in a message sending area of the chat interface.

In some embodiments of the present disclosure, the number of the second expression images is plural; based on the scheme, the expression image sending module is configured to: and responding to a second trigger operation of the user on a target expression image in the second expression image, and displaying the target expression image in a message sending area of the chat interface.

In some embodiments of the present disclosure, based on the foregoing solution, the expression image processing apparatus further includes: and the first expression image determining module is used for determining the first expression image according to the character information or the emotion information input by the user.

In some embodiments of the present disclosure, based on the foregoing solution, the expression image processing apparatus further includes: the object information acquisition module is used for acquiring an expression image and identifying a target object in the expression image to acquire object information; the emotion information acquisition module is used for extracting features of the target object to acquire feature information, acquiring context information corresponding to the expression image in a chat interface, and determining emotion information corresponding to the expression image according to the feature information and the context information; and the expression paraphrase information acquisition module is used for constructing expression paraphrase information corresponding to the expression image according to the object information and the emotion information.

In some embodiments of the present disclosure, based on the foregoing solution, the object information obtaining module is configured to: when the expression image is a pure character expression image, extracting character information in the expression image through a character recognition model; or when the expression image is a non-character expression image, performing feature extraction on an image main body in the expression image through a first image recognition model to obtain main body information; or when the expression image is an expression image containing characters and an image main body, performing feature extraction on the characters and the image main body through a second image recognition model to acquire character information and main body information.

In some embodiments of the present disclosure, the expression image is a plain text expression image; based on the above scheme, the emotion information acquisition module is configured to: segmenting word information corresponding to the target object to obtain key words in the word information, and simultaneously obtaining context information corresponding to the expression image in the conversation interface; and determining the emotion information corresponding to the expression image according to the emotion information corresponding to the keyword and the context information.

In some embodiments of the present disclosure, the expression image is a non-text expression image; based on the above scheme, the emotion information acquisition module includes: the feature extraction unit is used for extracting features of the image main body in the expression image through a third image recognition model so as to obtain image main body features; and the matching unit is used for matching the image main body characteristics with expression characteristics corresponding to all system expressions and determining emotion information corresponding to the expression images according to matching results and the context information.

In some embodiments of the present disclosure, based on the foregoing, the matching unit is configured to: matching the image main features with expression features of the system expressions to obtain a plurality of feature similarities; respectively comparing each feature similarity with a similarity threshold, and judging whether a system selection expression with the feature similarity of the image main body features larger than or equal to the similarity threshold exists or not; when the emotion label exists, the emotion label corresponding to the candidate system expression is used as the emotion information of the expression image based on the context information; and when the emotion information does not exist, setting the emotion information of the expression image to be null.

In some embodiments of the present disclosure, the expression image is an expression image including text and an image body; based on the above scheme, the emotion information acquisition module is configured to: acquiring character information in the expression image, and segmenting the character information to acquire a keyword in the character information; performing feature recognition on an image main body in the expression image through a fourth image recognition model to obtain image main body features, and matching the image main body features with expression features corresponding to all system expressions to obtain emotion labels of the system expressions which are successfully matched; and determining the emotion information corresponding to the expression image according to the keyword, the context information and the emotion label of the successfully matched system expression.

In some embodiments of the present disclosure, the expression images include a first type of expression image and a second type of expression image; based on the foregoing, the feature extraction unit is configured to: when the expression image is the first type expression image, identifying an image main body in the first type expression image through the third image identification model and extracting features to obtain the features of the image main body; and when the expression image is the second expression image, identifying and performing feature extraction on an image main body in each frame of image contained in the second expression image through the third image identification model to obtain the image main body features.

In some embodiments of the present disclosure, based on the foregoing solution, the expression paraphrase information obtaining module is configured to: and filling the object information and the emotion information into an expression paraphrase template according to a preset rule to form expression paraphrase information corresponding to the expression image.

According to an aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the expression image processing method according to the embodiments described above.

According to an aspect of an embodiment of the present disclosure, there is provided an electronic device including one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to execute the expression image processing method according to the above embodiment.

In the technical scheme provided by the embodiment of the disclosure, according to a first trigger operation of a user on a first expression image, a second expression image having the same emotion label as the first expression image is acquired, and the second expression image is displayed in a chat interface; and then responding to a second trigger operation of the user on the second expression image, and displaying the second expression image in a message sending area of the chat interface. According to the technical scheme, the second expression image with the same emotion label can be acquired by operating the first expression image, the efficiency and accuracy of acquiring the expression image are improved, and the user experience is further improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty. In the drawings:

fig. 1 shows a schematic diagram of an exemplary system architecture to which technical aspects of embodiments of the present disclosure may be applied;

fig. 2 schematically shows a flow diagram of an expression image processing method according to an embodiment of the present disclosure;

fig. 3 schematically shows a flowchart for acquiring a second expression image according to an embodiment of the present disclosure;

4A-4B schematically illustrate interface diagrams for acquiring a second expression image according to one embodiment of the present disclosure;

5A-5B schematically illustrate interface diagrams for acquiring a second expression image according to one embodiment of the present disclosure;

FIG. 6 schematically illustrates an interface diagram displaying an emoticon according to one embodiment of the present disclosure;

FIG. 7 schematically illustrates a flow diagram for constructing emoticon paraphrase information according to one embodiment of the present disclosure;

8A-8C schematically illustrate three emoticon images according to one embodiment of the present disclosure;

9A-9D schematically illustrate a training sample and an expression image to be detected according to one embodiment of the present disclosure;

FIG. 10 schematically shows a flowchart for obtaining emotion information of a plaintext expression image according to an embodiment of the disclosure;

FIG. 11 is a schematic flow chart illustrating the process of determining emotion information corresponding to an expression image according to the matching result of the image subject feature and the expression feature corresponding to the system expression according to an embodiment of the disclosure;

FIG. 12 schematically shows a flowchart for obtaining emotion information of an emoticon including text and an image subject according to an embodiment of the present disclosure;

fig. 13 schematically illustrates an interface diagram for transmitting a second emoticon according to an embodiment of the present disclosure;

14A-14D schematically illustrate interface diagrams for sending emoticons according to one embodiment of the present disclosure;

FIG. 15 schematically illustrates an interface diagram of a message prompt containing emoticon information, according to one embodiment of the present disclosure;

fig. 16 schematically shows a block diagram of an expression image processing apparatus according to an embodiment of the present disclosure;

FIG. 17 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solutions of the embodiments of the present disclosure may be applied.

As shown in fig. 1, system architecture 100 may include terminal device 101, network 102, and server 103. Network 102 is the medium used to provide a communication link between terminal device 101 and server 103 connection types, such as wired communications. Network 102 may include various links, wireless communication links, and the like.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired. For example, the server 103 may be a server cluster composed of a plurality of servers. The terminal device 101 may be a terminal device with a display screen such as a notebook, a laptop, a smartphone, or the like.

In one embodiment of the disclosure, a user can connect to the network 102 through the terminal device 101 and download various chat tools, and when using the chat tools, the user can receive and send text and/or emoticons through the terminal device 101 to communicate with others.

When the emotion image is sent through the terminal device 101, a user may perform a first trigger operation on a first emotion image, the terminal device 101 may respond to the first trigger operation to acquire a second emotion image of the first emotion image with the same emotion label, and display the second emotion image in the chat interface, and then respond to a second trigger operation of the user on the second emotion image, and may display the second emotion image in a message sending area of the chat interface. Meanwhile, after the user performs the first trigger operation on the first expression image, the terminal device 101 may form an expression image acquisition request according to the first trigger operation, and sends the emoticon image acquisition request to the server 103 via the network 102, the server 103 acquires a second emoticon image having the same emotion label as the first emoticon image in response to the emoticon image acquisition request, and transmits the second emoticon image to the terminal apparatus 101 to display the second emoticon image in the chat interface, after the user performs the second trigger operation on the second expression image, the terminal device 101 may form an expression image transmission request according to the second trigger operation, and sends the emoticon sending request to the server 103 through the network 102, and the server 103 sends a second emoticon to the terminal apparatus 101 in response to the emoticon sending request, and displays the second emoticon in a message sending area of the chat interface. The first expression image can be a system expression carried by the chat tool or a user-defined expression image; the second expression image is a custom expression image which has the same emotion label as the first expression image and is different from the first expression image.

When a user needs to acquire expression paraphrase information of a received expression image, the terminal device 101 can send the expression image to the server 103 through the network 102, after the server 103 receives the expression image, a target object in the expression image can be identified to acquire object information, for example, a cartoon character and a line of characters exist in the expression image, the cartoon character and the characters are the target object, then the server 103 can perform feature extraction on the target object, acquire context information corresponding to the expression image, determine emotion information corresponding to the expression image according to the extracted feature information and the context, finally construct the expression paraphrase information according to the object information and the emotion information, and send the expression paraphrase information to the terminal device 101 for display. Certainly, the terminal device 101 may also identify a target object in the expression image to obtain object information, then perform feature extraction on the target object, obtain context information corresponding to the expression image, determine emotion information corresponding to the expression image according to the extracted feature information and context, finally construct expression paraphrase information according to the object information and the emotion information, and display the expression paraphrase information in the chat interface. According to the technical scheme of the embodiment of the disclosure, the second expression image with the same emotion label can be determined according to the emotion label of the first expression image, so that the accuracy and efficiency of obtaining the expression image are improved, and the user experience is further improved; in addition, information in the expression images can be extracted, the expression images are converted into character information, understanding of the expression images by special people is facilitated, and communication quality and efficiency are improved.

It should be noted that the expression image processing method provided by the embodiment of the present disclosure is generally executed by a server, and accordingly, the expression image processing apparatus is generally disposed in the server. However, in other embodiments of the present disclosure, the expression image processing method provided by the embodiments of the present disclosure may also be executed by a terminal device.

In the related art in the field, when a user sends an expression image, an expression package needs to be downloaded locally, and then the required expression images are selected one by one from the expression package to be sent, or the required expression images are searched online to be sent, the operation is time-consuming and labor-consuming, and for a visually impaired user, the operation is more difficult to obtain the required expression images. Generally, a visually impaired user can install a screen reading program in a terminal device to realize basic operation during internet surfing. The screen reading program is screen reading software specially designed for blind people or people with visual impairment, and files can be searched and processed through switching operation of a user on a numeric keyboard and switching of a plurality of function keys on a large keyboard, and a webpage can be navigated, browsed, edited and sent and received by e-mails.

However, the screen reading program can only read the text information in the display screen to help the user understand the content in the display screen, but when the user uses the chat tool, the received expression image cannot be converted into text if the expression image is a custom expression image, so that the screen reading program cannot know the content of the expression image, and further the user cannot understand the content of the expression image.

In view of the problems in the related art, the embodiments of the present disclosure provide an expression image processing method, which is an Artificial Intelligence (AI) based method that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, perceives an environment, acquires knowledge, and uses the knowledge to obtain an optimal result. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.

The scheme provided by the embodiment of the disclosure relates to an artificial intelligence image processing technology, and is specifically explained by the following embodiment:

fig. 2 schematically shows a flowchart of an expression image processing method according to an embodiment of the present disclosure, which may be performed by a server, which may be the server 103 shown in fig. 1, or by a terminal device, which may be the terminal device 101 shown in fig. 1. Referring to fig. 2, the expression image processing method at least includes steps S210 to S220, and the following is described in detail:

in step S210, in response to a first trigger operation of a user on a first emoticon, a second emoticon having the same emotion label as the first emoticon is displayed in a chat interface.

In one embodiment of the disclosure, the first emoticon may be a system emoticon carried in the chat tool, or may be a custom emoticon with an emotion tag. The method comprises the steps that a user opens a chat interface after logging in a chat program, clicks a system expression identification icon on the chat interface, can display a plurality of system expressions in a message waiting area of the chat interface, and can select a first expression image from the system expressions, wherein the emotion label of the first expression image is the same as the emotion information of a second expression image which the user wants to send; or the user can click the user-defined expression image identification image on the chat interface, a plurality of user-defined expression images are displayed in the message waiting area of the chat interface, the user can select the first expression image from the user-defined expression images, and the emotion label of the first expression image is the same as the emotion information of the second expression image which the user wants to send. After the first expression image is determined, a user can perform first trigger operation on the first expression image to acquire a second expression image with the same emotion label as the first expression image, the second expression image is displayed in a chat interface, and the user can perform operations such as copying, forwarding and sending on the second expression image. The second emoticon can be displayed in a message sending area of the chat interface or can be displayed in a message sending area of the chat interface in a suspending manner, wherein the message sending area is an area, which is positioned below the text edit box, in the chat interface, and the message sending area can also comprise the text edit box area; the message sending area is an area positioned above the text editing box in the chat interface and is used for displaying various messages sent by the user and the friends; the message sending area and the message waiting area may be set up vertically, or may be set up horizontally or on different pages, which is not specifically limited in this disclosure.

In an embodiment of the present disclosure, the first trigger operation may specifically be a pressing operation, a double-click operation, or other trigger operations different from directly sending an expression, fig. 3 shows a flowchart of acquiring a second expression image, as shown in fig. 3, in step S301, a response is made to the first pressing operation of the first expression image by the user; the first pressing operation is a first triggering operation of the user on the first expression image. In step S302, comparing the duration of the first pressing operation with a preset duration; in step S303, when the duration of the first pressing operation is greater than or equal to a preset duration, displaying a second emoticon image having the same emotion label as the first emoticon image in the chat interface; whether a target expression image corresponding to the first expression image is triggered to be displayed or not can be judged according to the relation between the duration of the first pressing operation and the preset duration, when the duration of the pressing operation is larger than or equal to the preset duration, a second expression image with the same emotion label as the first expression image is triggered to be displayed, and when the duration of the pressing operation is smaller than the preset duration, the second expression image with the same emotion label as the first expression image is not displayed, but the first expression image is sent to a message sending area of a chat interface. The preset time period may be set according to actual needs, for example, may be 3s, 5s, and the like, which is not specifically limited in this disclosure.

4A-4B show schematic diagrams of an interface for acquiring a second expression image, and if a user wants to acquire an expression image with a "surprise" emotion label containing Durala A dream, a system expression with a "surprise" emotion label can be selected from the system expression set, as shown in FIG. 4A; after determining the system expression with the emotion tag being "surprised", a first trigger operation may be performed on the system expression, for example, if the system expression is pressed for 5s, an expression image containing duo la a dream appears above the system expression, and the corresponding emotion tag is also "surprised", as shown in fig. 4B.

5A-5B show interface schematic diagrams for acquiring a second expression image, and if a user wants to acquire an expression image with a "surprised" emotion label and containing Duraemon, a user-defined expression image with a "surprised" emotion label can be selected from a user-defined expression image set, as shown in FIG. 5A, the user-defined expression image is a kitten with a "surprised" emotion label; after determining that the emotion tag is a 'surprised' custom expression image, a first trigger operation can be performed on the custom expression image, if the system expression is pressed for a long time, an expression image containing duo la a dream appears above the custom expression image, and the corresponding emotion tag is also 'surprised', as shown in fig. 5B.

Further, in order to help the visually impaired user to acquire a desired expression image, when a second expression image having the same emotion label as the first expression image is acquired, expression paraphrase information corresponding to the second expression image is acquired, and the second expression image and the corresponding expression paraphrase information are displayed in the message waiting area. After the second expression image and the corresponding expression paraphrase information are displayed in the message waiting area of the chat interface, the screen reading program can acquire the expression paraphrase information in time and play the expression paraphrase information to the user, help the user understand the second expression image and determine whether to send the second expression image.

In an embodiment of the disclosure, corresponding to the same first expression image, there may be a plurality of second expression images having the same emotion labels as the second expression images, after the user performs a first trigger operation on the first expression image, one or more second expression images with higher user utilization rate obtained through statistics may be displayed in the chat interface, and also all the second expression images may be displayed, and when the finger of the user slides onto any one of the second expression images, the screen reading program may obtain paraphrase information corresponding to the second expression image and play the paraphrase information to the user. After the user performs the second trigger operation on the target expression image in the plurality of second expression images, the target expression image may be displayed in a message sending area of the chat interface.

Fig. 6 is a schematic diagram of an interface for displaying expression images, as shown in fig. 6, when a user wants to send an expression image expressing a pleasant mood, and performs a first trigger operation on a first expression image with a pleasant emotion tag, a second expression image related to the expression image is displayed in a chat interface, as shown in fig. 6, the expression image related to the expression image includes two expression images, one expression image is an expression image with a kitten as an image main body, and corresponding expression paraphrase information is: the kitten enjoys: preferably, one is an expression image of which the main image body is a puppy, and the corresponding expression paraphrase information is as follows: the puppies were happy: good happy! The user can move the finger upwards, when the finger is positioned on the expression image of which the image main body is a puppet, the screen reading program can play corresponding expression paraphrase information to the user, when the finger is positioned on the expression image of which the image main body is a puppet, the screen reading program can also play corresponding expression paraphrase information to the user, and the user can select a second expression image to be sent according to the expression paraphrase information played by the screen reading program.

In an embodiment of the disclosure, the expression image having the same emotion tag as the system expression or the custom expression image having the emotion tag may be stored in a local database of the terminal device in advance, or may be stored in a database related to the server, and after receiving a first trigger operation of the user on the first expression image, the target expression image having the same emotion tag as the first expression image may be acquired from the database according to the first expression image and displayed in the chat interface. Further, when storing the expression image having the same emotion label as the system expression or the user-defined expression image having the emotion label, the expression paraphrase information corresponding to the expression image may also be stored at the same time, and the expression paraphrase information may be constructed and obtained according to the object information in the expression image and the emotion information corresponding to the expression image.

Fig. 7 is a schematic diagram of a process for constructing the emoticon information, and as shown in fig. 7, the process at least includes steps S701 to S703, specifically:

in step S701, an expression image is acquired, and a target object in the expression image is identified to acquire object information.

In an embodiment of the present disclosure, when a user chats with a chatting tool, the user receives a system expression or expression image sent by another user, and usually the system expression marks emotion information or expression content of the system expression when the system expression is online, so when the system expression is sent or received, the emotion information or expression content of the system expression can be obtained, but the expression image does not specially mark the emotion information or expression content thereof, and therefore, after the expression image is received, the user needs to determine the emotion information or expression content of the expression image according to text and/or image body information in the expression image, and further needs to determine the emotion information or expression content of the expression image by combining context information of the chatting, which makes it more difficult for a visually impaired user to understand chat content, and therefore in an embodiment of the present disclosure, the emotion information or expression content in the emotion image can be extracted and converted into character information, so that the screen reading program can transmit the emotion information or expression content to the visually impaired user, and the visually impaired user can smoothly communicate.

In one embodiment of the present disclosure, after the expression image is acquired, the target object in the expression image may be identified to acquire object information. Fig. 8A to 8C show three expression images, as shown in fig. 8A, the expression image is a pure character expression image, in which there is no image subject, and only the text "i love work", the text content is the object information; as shown in fig. 8B, the expression image is a non-text expression image, only one image subject is a kitten, and the image subject is the object information; as shown in fig. 8C, the expression image is an expression image including a text and an image main body, where the text is "haha, haha", the image main body is the Nezha, and the text content and the image main body are object information.

In an embodiment of the present disclosure, when the expression image is a pure character expression image, the character information in the expression image may be extracted through a character Recognition model, where the character Recognition model may be a model based on an Optical Character Recognition (OCR) algorithm, and when the pure character expression image is received, the pure character expression image may be first converted into a gray scale image, then an edge of a text is detected by using a canny algorithm to obtain a text region, and finally characters in the text region are recognized to obtain the character information. When the expression image is a non-character expression image, feature extraction may be performed on an image subject in the expression image through a first image recognition model to obtain subject information, where the first image recognition model may be a machine learning model for performing image recognition, such as a convolutional neural network and a residual neural network, the first image recognition model may include a plurality of convolution units for performing convolution processing on the expression image to obtain image subject features therein, and then classification prediction may be performed on the image subject features to determine subject information corresponding to the image subject, that is, classification information of the image subject, such as that the image subject in fig. 8B is a cat and the image subject in fig. 8C is a myzha. When the expression image is an expression image containing characters and an image main body, the characters in the expression image can be recognized through a character recognition model, then extracting the characteristics of the image main body through a second image recognition model to obtain character information and main body information, of course, the second image recognition model can be used to extract the characters and the image subject in the image expression at the same time, the second image recognition model comprises two sub-models respectively used for extracting the text information and the main body information of the image main body, the method in which the subject information of the image subject is extracted may be the same as the method in which the first image recognition model performs feature extraction on the image subject in the expression image, the method for extracting the text information may be the same as the method for extracting the text information in the expression image by the text recognition model, and is not described herein again. The expression images comprise a first type of expression images and a second type of expression images, the first type of expression images are static expression images, the second type of expression images are dynamic expression images, and when the expression images are static expression images, image main bodies in the static expression images can be identified through a third image identification model and feature extraction is carried out to obtain image main body features; when the expression image is the dynamic expression image, the image main body in each frame of image contained in the dynamic expression image can be identified and feature extraction is carried out through the third image identification model, so as to obtain the image main body feature. The first image recognition model, the second image recognition model and the third image recognition model may be image recognition models having the same structure, or image recognition models having different structures.

In an embodiment of the disclosure, in order to improve efficiency and accuracy of model processing, a first image recognition model, a second image recognition model and a third image recognition model need to be trained, during training, a plurality of named expression images can be collected as training samples, an image subject predicted by the models is obtained by inputting the expression images into the image recognition models, then names of the predicted image subject and the expression images are compared, and when a difference between names of the predicted image subject and the expression images is large, parameters of the image recognition models can be continuously adjusted so that names of the predicted image subject and the expression images are the same or similar. FIGS. 9A-9D show training samples and images of expressions to be detected, as shown in FIGS. 9A-9B, for training samples for training image recognition models, Duola A dream and panda, respectively; after the image recognition models are trained according to the training samples, the trained image recognition models can be adopted to recognize image subjects of the representation images to be detected (figures 9C-9D) so as to obtain image subjects of Duraemon A dream and pandas.

In step S702, feature extraction is performed on the target object to obtain feature information, and context information corresponding to the expression image in the chat interface is obtained at the same time, and emotion information corresponding to the expression image is determined according to the feature information and the context information.

In one embodiment of the disclosure, after the object information is acquired, feature extraction may be performed on the target object to acquire feature information, and emotion information corresponding to the target expression image is determined according to the feature information. The method comprises the steps that corresponding to different types of target objects, corresponding characteristic information is different, and specifically, when the target objects are texts, the characteristic information is keywords in the texts; when the target object is an image subject, the feature information is an attribute feature of the image subject, and taking the Nezha in FIG. 8C as an example, the attribute feature is feature information of five sense organs and limb state information; when the target object is a text and an image main body, the characteristic information is the key words in the text and the attribute characteristics of the image main body.

In one embodiment of the present disclosure, the emotional information may be subdivided into a plurality of types, such as happy, panic, crying, shy, laugh, pried, sweaty, and the like, and some subdivided emotional information are similar, such as happy, laugh, crying, lacrimation, and the like, so that there may be a plurality of emotional information corresponding to the same text information and/or feature information of the image subject, such as the Nezha in FIG. 8C, and the emotional information can be determined to be laughter and happy according to the extracted text information "haha and the feature information < raise of the mouth angle > of the image subject. In order to ensure that the emotion information corresponding to the expression image is accurately acquired, the judgment can be carried out according to the context information corresponding to the expression image, the chat environment corresponding to the expression image can be determined through the context information, and the emotion information of the expression image sent by the user can be accurately grasped. The context information may be all contents of the chat or keywords extracted from the chat contents.

In an embodiment of the present disclosure, fig. 10 is a schematic diagram illustrating a process of acquiring emotion information of a pure text expression image, as shown in fig. 10, the process at least includes steps S1001 to S1002, specifically:

in step S1001, the text information corresponding to the target object is segmented to obtain a keyword in the text information, and context information corresponding to the emoticon in the chat interface is obtained at the same time.

In an embodiment of the present disclosure, when segmenting words, the word information may be segmented by using algorithms such as a forward maximum matching algorithm, a reverse maximum matching algorithm, an N-gram two-way maximum matching algorithm, an HMM word segmentation algorithm, etc., and part-of-speech tagging may be performed on the segmented result, and according to the part-of-speech tagging result, a keyword in the word information may be determined, where the keyword is usually a noun, an adjective, a verb or a compound word containing multiple parts-of-speech, for example, "none go, and what", and a result obtained after segmenting is "none go, and what" is the keyword is "none go", and if only analyzed from the keyword, the emotion information corresponding to the text information may be a negative emotion that is not resistant to annoyance or may be a positive emotion that is expected, and in order to accurately determine emotion information of an expression image, context information of a plain-word expression image needs to be obtained, and determining the emotion information corresponding to the pure character expression image according to the context information.

In step S1002, emotion information corresponding to the expression image is determined from emotion information and context information corresponding to the keyword.

In an embodiment of the present disclosure, taking the example in step S1001 as an example, analyzing keywords obtained by the segmentation, whose emotion information may be impatient or expected, may be further determined according to session messages sent by other users in the session interface. For example, the context information corresponding to the plain text expression image is "a person goes to a picnic bar today", "good a", "few points of departure? "then it can be determined that the context of the conversation is a plan for discussing the tour, and thus it can be determined what the emotion information of the plain text emoticon" don't go, and wait "is the expected positive emotion, rather than the impatient negative emotion.

In an embodiment of the disclosure, when the target object is a non-character expression image, feature extraction may be performed on an image subject in the non-character expression image through a third image recognition model, and emotion information corresponding to the non-character expression image is determined according to the extracted image subject feature, expression features corresponding to expressions of a conversation system, and context information. When the feature extraction is performed on the image main body through the third image recognition model, the feature point coordinates of each part of the image main body are mainly obtained, for example, when the image main body is a person or an animal with the characteristics of the five sense organs and the characteristics of limbs, the coordinates of the characteristic points of the five sense organs and the characteristics of the limbs can be determined, so as to determine the state of the image main body according to the coordinates, wherein the coordinates of the characteristic points of the five sense organs can be, for example, the coordinates of the left end point and the right end point of the eyebrow and the coordinates of the eyebrow; coordinates of the canthus and the tail of the left eye and the right eye, and coordinates of the highest point of the upper eyelid and the lowest point of the lower eyelid of the eyes; the coordinates of the tip of the nose; the coordinates of the preset points corresponding to the corners of the mouth, the upper lips and the lower lips, and the like, the coordinates of the limb feature points may be, for example, the coordinates of the hand feature points, the coordinates of the waist feature points, and the like, and further, the limb feature points may be points set according to actual needs. The state of the five sense organs can be determined according to the coordinates of the feature points of the five sense organs, such as the lifting of eyebrows, frowning, the gazing of eyes, tight closure, slight opening, the opening of mouth, the tight closure and the like, the state of limbs can be determined according to the coordinates of the feature points of the limbs, such as the waving of hands, the twisting of waist, the lifting of legs and the like, and the main body features of the image can be determined by combining the features of the five sense organs and the features of the limbs. Then, the image main body characteristics can be compared with the expression characteristics of the system expression, and when the similarity between the image main body characteristics and the expression characteristics of the system expression reaches a preset value and conforms to the chat environment corresponding to the context information, the emotion label corresponding to the system expression can be used as the emotion information corresponding to the image main body, namely the emotion information corresponding to the expression image. For example, the main features of the image are eyebrow lifting, eye gazing and mouth opening, an expression matched with the features exists in the system expression, and the corresponding emotional label is surprised, so that the emotional information of the expression image can be determined to be surprised.

Fig. 11 is a schematic flow chart illustrating a process of determining emotion information corresponding to an expression image according to a matching result between an image main body feature and an expression feature corresponding to a system expression, as shown in fig. 11, in step S1101, an image main body feature is matched with the expression feature of each system expression to obtain a plurality of feature similarities; in step S1102, the feature similarities are respectively compared with a similarity threshold, and it is determined whether there is a candidate system expression whose feature similarity with the image subject feature is greater than or equal to the similarity threshold; in step S1103, when existing, regarding an emotion tag corresponding to the candidate system expression as emotion information of an expression image based on the context information; in step S1104, when not present, emotion information of the emotive image is set to null.

In an embodiment of the present disclosure, when the expression image includes a text and an image main body, fig. 12 shows a schematic flow chart of obtaining emotion information of the expression image including the text and the image main body, as shown in fig. 12, in step S1201, the text information in the expression image is obtained, and the text information is segmented to obtain a keyword in the text information; in step S1202, performing feature recognition on an image main body in the expression image through a fourth image recognition model to obtain image main body features, and matching the image main body features with expression features corresponding to all system expressions to obtain emotion tags of the system expressions which are successfully matched; in step S1203, emotion information corresponding to the emotion image is determined from the keyword, the context information, and the emotion label of the system emotion that is successfully matched. In step S1203, a session context may be first determined according to the context information, and then emotion information that is the same as or similar to the chat context is determined from emotion information corresponding to the keyword and an emotion tag of the system expression that is successfully matched.

In step S703, expression paraphrase information corresponding to the expression image is constructed from the object information and the emotion information.

In an embodiment of the disclosure, after the object information of the target object in the expression image and the emotion information corresponding to the expression image are acquired, the expression paraphrase information may be formed according to the object information and the emotion information, and specifically, the object information and the emotion information may be filled into the expression paraphrase template according to a preset rule to form the expression paraphrase information. The structure of the expression paraphrase template can be set according to actual needs, for example, when the expression image is a pure character expression image, the expression paraphrase template can be set as follows: and (3) emotion information: text information; when the expression image is a non-character expression image, the expression paraphrase template can be set as follows: image subject + emotion information; when the expression image is an expression image containing characters and an image main body, the expression paraphrase template can be set as follows: image subject + feeling information: the emotion information in the expression paraphrase template can be default because the emotion information can be null. Of course, the expression paraphrase template may be set to have other structures, and the embodiment of the disclosure is not particularly limited thereto.

Taking the three expression images shown in fig. 8A-8C as an example, fig. 8A is a pure character expression image, the corresponding emotion information is like, the character information is "i love work", and then the formed expression paraphrase information is: like: i love work; fig. 8B is a non-character expression image, the corresponding emotion information is lovely, the image main body is a kitten, and the formed expression paraphrase information is: the kitten can love; fig. 8C is an expression image including text and an image subject, where corresponding emotional information is laughter, the text information is "haha, and the image subject is nazha, and then the formed expression interpretation information is: nezha laugh: haha.

Taking the expression images shown in fig. 9C-9D as an example, the image main body in fig. 9C is duo la dream, the emotional information is surprised, and the text information is "do you stay me? ", then the resulting emoticon information is: pyridola a dreams are surprised: do you comma me? (ii) a The image subject in fig. 9D is a panda, the emotion information is null, and the text information is "what can you get by cheating? ", then the resulting emoticon information is: pandas: what can you get by deceiving? .

In an embodiment of the disclosure, when the system expression or the user-defined expression image with the emotion tag has the same emotion information as the acquired expression image, a mapping relationship may be established between the expression image and the system expression or the user-defined expression image with the emotion tag, and the expression image and corresponding expression paraphrase information may be stored locally, so that a user may quickly find a related expression image according to the system expression or the user-defined expression image with the emotion tag, and perform operations such as sending and sharing. When a user carries out first trigger operation on a first expression image, a second expression image corresponding to the first expression image and expression paraphrase information corresponding to the second expression image can be simultaneously acquired, the second expression image and the expression paraphrase information corresponding to the second expression image are displayed in a chat interface, and after the screen reading program acquires the expression paraphrase information corresponding to the second expression image, voice playing can be carried out on the user, so that the user is helped to determine whether to send the second expression image. In addition, for the new expression image received by the user, the expression paraphrase information corresponding to the new expression image can be obtained through the method for obtaining the expression paraphrase information, and the visually impaired can determine the meaning of the new expression image according to the playing of the screen reading program, so that the communication quality and efficiency with other people are improved.

In step S220, in response to a second trigger operation of the user on the target emoticon, displaying the second emoticon in a message sending area of the chat interface.

In an embodiment of the disclosure, after acquiring the second expression image, the user may select whether to send the second expression image according to actual needs, and if the user decides to send the second expression image, a second trigger operation may be performed on the second expression image, so that the second expression image is displayed in the message sending area of the chat interface. Further, the second emoticon image and the emoticon paraphrase information corresponding to the second emoticon image can be simultaneously displayed in a message sending area of the chat interface. The second trigger operation may specifically be a single-click operation, a double-click operation, and the like, which is not specifically limited in this disclosure.

Fig. 13 shows an interface schematic diagram for sending a second expression image, and as shown in fig. 13, after determining that sending of an expression image of a surprise emotional information, the user may perform a second trigger operation on the expression image, for example, may click the expression image, and may display the expression image or display the expression image and expression paraphrase information corresponding to the expression image in the message sending area.

In an embodiment of the disclosure, a user may further input text information or emotion information in a text edit box, after the text information or emotion information is acquired, the text information or emotion information may be matched with all system expressions or custom images with emotion tags to acquire a first expression image, and then the user correspondingly triggers the first expression image to send the expression image or send the expression image and expression paraphrase information corresponding to the expression image.

Fig. 14A to 14D are schematic diagrams illustrating an interface for transmitting an emoticon, and as shown in fig. 14A, a user inputs text information or emotion information related to an emoticon to be transmitted in a message edit box, for example, the user inputs emotion information in the edit box: surprisingly, after receiving the emotion information, matching the emotion information with all system expressions, and acquiring a first expression image with an emotional tag being surprising, as shown in fig. 14B; the user presses the first emoticon image for a long time, and displays a second emoticon image corresponding to the first emoticon image and emoticon paraphrase information corresponding to the second emoticon image on the chat interface, as shown in fig. 14C; then, the user clicks the second expression image, and the second expression image and the corresponding expression paraphrase information can be sent to the message sending area to realize communication with other users, as shown in fig. 14D. After the user sends the second emoticon, the second emoticon and the emoticon corresponding to the second emoticon are displayed in the chat interface of other users, and even if other users do not open the conversation interface, a message prompt including the emoticon can be performed, as shown in fig. 15.

In an embodiment of the disclosure, after a user inputs text information or emotion information related to an expression image that the user wants to send in a message edit box, the expression image corresponding to the text information or emotion information may also be acquired, and when the user performs a trigger operation on the acquired expression image, the corresponding expression paraphrase information may be obtained according to the method for acquiring expression paraphrase information of the above embodiment, and further, the expression paraphrase information may be played to the user through a screen reading program, so as to help the user understand the content of each expression image and determine which expression image to send.

In the embodiment of the disclosure, a second expression image or a second expression image having the same emotion label as the first expression image and expression paraphrase information corresponding to the second expression image are displayed in a chat interface through a first trigger operation of a user on the first expression image, and then the second expression image or the second expression image and the expression paraphrase information corresponding to the second expression image are displayed in a message sending area of the chat interface through a second trigger operation of the user on the second expression image. In addition, the expression paraphrase information corresponding to the expression image can be constructed according to the object information of the target object in the expression image and the emotion information corresponding to the expression image in the embodiment of the disclosure. According to the embodiment of the disclosure, the second expression image with the same emotion label as the first expression image can be accurately and quickly determined according to the first expression image, so that the accuracy and efficiency of acquiring and sending the expression image are improved, and the user experience is further improved; in addition, information in the expression images can be extracted, the expression images are converted into character information, understanding of special people on the expression images is facilitated, and communication quality and efficiency are improved.

The following describes embodiments of the apparatus of the present disclosure, which may be used to perform the expression image paraphrasing method and the expression image processing method in the above-described embodiments of the present disclosure. For details that are not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the facial expression image paraphrasing method and the facial expression image processing method of the present disclosure.

Fig. 16 schematically shows a block diagram of an expression image processing apparatus according to an embodiment of the present disclosure.

Referring to fig. 16, an expression image processing apparatus 1600 according to the present disclosure includes: an expression image acquisition module 1601 and an expression image sending module 1602.

The emotion image acquisition module 1601 is used for responding to a first trigger operation of a user on a first emotion image and displaying a second emotion image with the same emotion label as the first emotion image in a chat interface; an emoticon sending module 1602, configured to respond to a second trigger operation of the user on the emoticon, and display the emoticon in a message sending area of the chat interface.

In one embodiment of the present disclosure, the expression image acquiring module 1601 is configured to: responding to a first pressing operation of the user on the first expression image; comparing the duration of the first pressing operation with a preset duration; and when the duration of the first pressing operation is greater than or equal to the preset duration, displaying a second expression image with the same emotion label as the first expression image in the chat interface.

In one embodiment of the present disclosure, the expression image processing apparatus 1600 further includes: and the expression paraphrase information acquisition module is used for displaying the expression paraphrase information corresponding to the second expression image in the chat interface when the second expression image with the same emotion label as the first expression image is displayed in the chat interface.

In one embodiment of the present disclosure, the expression image sending module 1602 is configured to: and responding to a second pressing operation of the user on the second expression image, and displaying the second expression image and expression paraphrase information corresponding to the second expression image in a message sending area of the chat interface.

In one embodiment of the present disclosure, the number of the second expression images is plural; the expression image transmission module 1602 is configured to: and responding to a second trigger operation of the user on a target expression image in the second expression image, and displaying the target expression image in a message sending area of the chat interface.

In one embodiment of the present disclosure, the expression image processing apparatus 1600 further includes: and the first expression image determining module is used for determining the first expression image according to the character information or the emotion information input by the user.

In one embodiment of the present disclosure, the expression image processing apparatus 1600 further includes: the object information acquisition module is used for acquiring an expression image and identifying a target object in the expression image to acquire object information; the emotion information acquisition module is used for extracting features of the target object to acquire feature information, acquiring context information corresponding to the expression image in a chat interface, and determining emotion information corresponding to the expression image according to the feature information and the context information; and the expression paraphrase information acquisition module is used for constructing expression paraphrase information corresponding to the expression image according to the object information and the emotion information.

In one embodiment of the present disclosure, the object information acquiring module is configured to: when the expression image is a pure character expression image, extracting character information in the expression image through a character recognition model; or when the expression image is a non-character expression image, performing feature extraction on an image main body in the expression image through a first image recognition model to obtain main body information; or when the expression image is an expression image containing characters and an image main body, performing feature extraction on the characters and the image main body through a second image recognition model to acquire character information and main body information.

In one embodiment of the present disclosure, the expression image is a plain text expression image; the emotion information acquisition module is configured to: segmenting word information corresponding to the target object to obtain key words in the word information, and simultaneously obtaining context information corresponding to the expression image in the conversation interface; and determining the emotion information corresponding to the expression image according to the emotion information corresponding to the keyword and the context information.

In one embodiment of the present disclosure, the expression image is a non-text expression image; the emotion information acquisition module comprises: the feature extraction unit is used for extracting features of the image main body in the expression image through a third image recognition model so as to obtain image main body features; and the matching unit is used for matching the image main body characteristics with expression characteristics corresponding to all system expressions and determining emotion information corresponding to the expression images according to matching results and the context information.

In one embodiment of the present disclosure, the matching unit is configured to: matching the image main features with expression features of the system expressions to obtain a plurality of feature similarities; respectively comparing each feature similarity with a similarity threshold, and judging whether a system selection expression with the feature similarity of the image main body features larger than or equal to the similarity threshold exists or not; when the emotion label exists, the emotion label corresponding to the candidate system expression is used as the emotion information of the expression image based on the context information; and when the emotion information does not exist, setting the emotion information of the expression image to be null.

In one embodiment of the present disclosure, the expression image is an expression image including text and an image main body; the emotion information acquisition module is configured to: acquiring character information in the expression image, and segmenting the character information to acquire a keyword in the character information; performing feature recognition on an image main body in the expression image through a fourth image recognition model to obtain image main body features, and matching the image main body features with expression features corresponding to all system expressions to obtain emotion labels of the system expressions which are successfully matched; and determining the emotion information corresponding to the expression image according to the keyword, the context information and the emotion label of the successfully matched system expression.

In one embodiment of the present disclosure, the expression images include a first type of expression image and a second type of expression image; the feature extraction unit is configured to: when the expression image is the first type expression image, identifying an image main body in the first type expression image through the third image identification model and extracting features to obtain the features of the image main body; and when the expression image is the second expression image, identifying and performing feature extraction on an image main body in each frame of image contained in the second expression image through the third image identification model to obtain the image main body features.

In one embodiment of the present disclosure, the expression paraphrase information acquisition module is configured to: and filling the object information and the emotion information into an expression paraphrase template according to a preset rule to form expression paraphrase information corresponding to the expression image.

It should be noted that the computer system 1700 of the electronic device shown in fig. 17 is only an example, and should not bring any limitation to the functions and the scope of the application of the embodiments of the present disclosure.

As shown in fig. 17, the computer system 1700 includes a Central Processing Unit (CPU)1701 which can perform various appropriate actions and processes according to a program stored in a Read-Only Memory (ROM) 1702 or a program loaded from a storage portion 1708 into a Random Access Memory (RAM) 1703, implementing the image labeling method described in the above embodiments. In the RAM1703, various programs and data necessary for system operation are also stored. The CPU 1701, ROM 1702, and RAM1703 are connected to each other through a bus 1704. An Input/Output (I/O) interface 1705 is also connected to the bus 1704.

The following components are connected to the I/O interface 1705: an input section 1706 including a keyboard, a mouse, and the like; an output section 1707 including a Display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 1708 including a hard disk and the like; and a communication section 1709 including a network interface card such as a LAN (Local area network) card, a modem, or the like. The communication section 1709 performs communication processing via a network such as the internet. A driver 1710 is also connected to the I/O interface 1705 as necessary. A removable medium 1711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1710 as necessary, so that a computer program read out therefrom is mounted into the storage portion 1708 as necessary.

In particular, the processes described below with reference to the flowcharts may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via the communication portion 1709, and/or installed from the removable media 1711. When the computer program is executed by a Central Processing Unit (CPU)1701, various functions defined in the system of the present disclosure are executed.

It should be noted that the computer readable medium shown in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

As another aspect, the present disclosure also provides a computer-readable medium that may be contained in the image processing apparatus described in the above-described embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method described in the above embodiments.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An expression image processing method, comprising:

responding to a first trigger operation of a user on a first expression image, and displaying a second expression image with the same emotion label as the first expression image in a chat interface;

and responding to a second trigger operation of the user on the second expression image, and displaying the second expression image in a message sending area of the chat interface.

2. The emoticon image processing method of claim 1, wherein displaying a second emoticon image having the same emotion label as the first emoticon image in a chat interface in response to a first trigger operation of a first emoticon image by a user comprises:

responding to a first pressing operation of the user on the first expression image;

comparing the duration of the first pressing operation with a preset duration;

and when the duration of the first pressing operation is greater than or equal to the preset duration, displaying a second expression image with the same emotion label as the first expression image in the chat interface.

3. The method according to claim 1 or 2, wherein the method further comprises:

and when a second facial expression image with the same emotion label as the first facial expression image is displayed in a chat interface, displaying facial expression paraphrase information corresponding to the second facial expression image in the chat interface.

4. The emoticon processing method of claim 3, wherein the displaying the second emoticon in a message sending area of the chat interface in response to a second trigger operation of the user on the second emoticon comprises:

and responding to a second pressing operation of the user on the second expression image, and displaying the second expression image and expression paraphrase information corresponding to the second expression image in a message sending area of the chat interface.

5. The method according to claim 1, further comprising:

and determining the first expression image according to the character information or the emotional information input by the user.

6. The method according to claim 3, further comprising:

acquiring an expression image, and identifying a target object in the expression image to acquire object information;

extracting features of the target object to acquire feature information, acquiring context information corresponding to the expression image in a chat interface, and determining emotion information corresponding to the expression image according to the feature information and the context information;

and constructing expression paraphrase information corresponding to the expression image according to the object information and the emotion information.

7. The expression image processing method according to claim 6, wherein the recognizing the target object in the expression image to obtain the object information includes:

when the expression image is a pure character expression image, extracting character information in the expression image through a character recognition model; alternatively, the first and second electrodes may be,

when the expression image is a non-character expression image, performing feature extraction on an image main body in the expression image through a first image recognition model to obtain main body information; alternatively, the first and second electrodes may be,

and when the expression image is an expression image containing characters and an image main body, performing feature extraction on the characters and the image main body through a second image recognition model to obtain character information and main body information.

8. The expression image processing method according to claim 6, wherein the expression image is a plain expression image;

the method for extracting the features of the object to obtain feature information, simultaneously obtaining context information corresponding to the expression image in a chat interface, and determining emotion information corresponding to the expression image according to the feature information and the context information comprises the following steps:

segmenting word information corresponding to the target object to obtain key words in the word information, and simultaneously obtaining context information corresponding to the expression image in the chat interface;

and determining the emotion information corresponding to the expression image according to the emotion information corresponding to the keyword and the context information.

9. The expression image processing method according to claim 6, wherein the expression image is a non-text expression image;

the feature extraction of the target object to obtain feature information, the context information corresponding to the expression image in a conversation interface, and the determination of the emotion information corresponding to the expression image according to the feature information and the context information include:

performing feature extraction on the image main body in the expression image through a third image recognition model to obtain image main body features;

and matching the image main body characteristics with expression characteristics corresponding to all system expressions, and determining emotion information corresponding to the target expression image according to a matching result and the context information.

10. The expression image processing method according to claim 9, wherein the matching of the image subject features with expression features corresponding to all system expressions and the determining of emotion information corresponding to the expression image according to matching results and the context information includes:

matching the image main features with expression features of the system expressions to obtain a plurality of feature similarities;

respectively comparing each feature similarity with a similarity threshold, and judging whether candidate system expressions with the feature similarity of the image main body features larger than or equal to the similarity threshold exist or not;

when the emotion label exists, the emotion label corresponding to the candidate system expression is used as the emotion information of the expression image based on the context information;

and when the emotion information does not exist, setting the emotion information of the expression image to be null.

11. The method according to claim 6, wherein the expression image is an expression image containing text and an image body;

acquiring character information in the expression image, and segmenting the character information to acquire a keyword in the character information;

performing feature recognition on an image main body in the expression image through a fourth image recognition model to obtain image main body features, and matching the image main body features with expression features corresponding to all system expressions to obtain emotion labels of the system expressions which are successfully matched;

and determining the emotion information corresponding to the expression image according to the keyword, the context information and the emotion label of the successfully matched system expression.

12. The method for processing the expression image according to claim 6, wherein the constructing of the expression paraphrase information corresponding to the expression image according to the object information and the emotion information includes:

and filling the object information and the emotion information into an expression paraphrase template according to a preset rule to form expression paraphrase information corresponding to the expression image.

13. An expression image processing apparatus characterized by comprising:

the emotion image acquisition module is used for responding to a first trigger operation of a user on a first emotion image and displaying a second emotion image which has the same emotion label as the first emotion image in a chat interface;

and the expression image sending module is used for responding to a second trigger operation of the user on the second expression image and displaying the second expression image in a message sending area of the chat interface.

14. A computer-readable storage medium on which a computer program is stored, the computer program, when being executed by a processor, implementing the expression image processing method according to any one of claims 1 to 12.

15. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs that, when executed by the one or more processors, cause the one or more processors to perform the expression image processing method according to any one of claims 1 to 12.