WO2022083383A1

WO2022083383A1 - Image processing method and apparatus, electronic device and computer-readable storage medium

Info

Publication number: WO2022083383A1
Application number: PCT/CN2021/119567
Authority: WO
Inventors: 王岩
Original assignee: 北京字节跳动网络技术有限公司
Priority date: 2020-10-19
Filing date: 2021-09-22
Publication date: 2022-04-28
Also published as: CN112261424B; CN112261424A

Abstract

Provided are an image processing method and apparatus, an electronic device and a computer-readable storage medium, relating to the technical field of image processing. Said method comprises: acquiring live broadcast audio and video information collected in real time; identifying input information of a live broadcast user in the live broadcast audio and video information, and determining, according to the input information, a target object to be processed in the live broadcast audio and video information; performing deformation processing on a target image corresponding to the target object in the live broadcast audio and video information, so as to obtain a deformed target image; and performing synthesis processing on the deformed target image and an image in the live broadcast audio and video information, so as to obtain synthesized audio and video information for playing the synthesized audio and video information. In the embodiments of the present disclosure, when a live streamer mentions a target object, the target object may be highlighted by means of deformation processing, thereby improving the display and live broadcast effects.

Description

Image processing method, apparatus, electronic device, and computer-readable storage medium

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the priority of the Chinese patent application filed on October 19, 2020 with the application number 202011119916.0 and the invention titled "image processing method, device, electronic device and computer-readable storage medium", the full text of which is approved Reference is incorporated in this application.

technical field

The present disclosure relates to the technical field of image processing, and in particular, to an image processing method, apparatus, electronic device, and computer-readable storage medium.

Background technique

With the development of the mobile Internet and the popularization of mobile terminals, various application software continue to emerge, allowing users to experience more different functions when using mobile terminals. For example, current live broadcast applications allow users to see other users' live broadcast content in real time, and also to interact with the host in real time. However, at present, the display mode of the live broadcast interface of the live broadcast application is relatively simple, and the user's viewing experience is not good.

SUMMARY OF THE INVENTION

This Summary is provided to introduce concepts in a simplified form that are described in detail in the Detailed Description section that follows. This summary section is not intended to identify key features or essential features of the claimed technical solution, nor is it intended to be used to limit the scope of the claimed technical solution.

In a first aspect, an embodiment of the present disclosure provides an image processing method, the method includes: acquiring live audio and video information collected in real time; Determine the target object to be processed in the live audio and video information; perform deformation processing on the target image corresponding to the target object in the live audio and video information to obtain a deformed target image; The images in the live audio and video information are synthesized and processed to obtain synthesized audio and video information, which is used for playing the synthesized audio and video information.

In a second aspect, an embodiment of the present disclosure provides an image processing device, the device includes: an information acquisition module, configured to acquire live audio and video information collected in real time; a target determination module, configured to identify the live audio and video information in the live broadcast information. The input information of the user, and the target object to be processed is determined in the live audio and video information according to the input information; the target deformation module is used to deform the target image corresponding to the target object in the live audio and video information. , obtain the deformed target image; the image synthesis module is used for synthesizing the deformed target image and the image in the live audio and video information to obtain synthesized audio and video information, which is used for playing the synthesized audio video information.

In a third aspect, embodiments of the present disclosure provide an electronic device, the electronic device comprising: one or more computer programs, wherein the one or more computer programs are stored in the memory and configured to be executed by The one or more processors execute, the one or more computer programs configured to: perform the method of the first aspect above.

In a fourth aspect, an embodiment of the present disclosure provides a computer-readable storage medium on which a computer program is stored, and when the computer program is invoked and executed by a processor, the method described in the first aspect above is implemented.

An image processing method, device, electronic device, and computer-readable storage medium provided by the embodiments of the present disclosure, by acquiring live audio and video information collected in real time, and then identifying the input information of the live user in the live audio and video information, and according to the input information Determine the target object to be processed in the live audio and video information, and then perform deformation processing on the target image corresponding to the target object in the live audio and video information to obtain the deformed target image, and compare the deformed target image and the live audio and video information. Synthesize the images in the device to obtain synthesized audio and video information, which is used to play the synthesized audio and video information. Therefore, the embodiment of the present disclosure can obtain input information during the live broadcast of the live broadcast user to determine the target object to be processed, and effectively highlight the target object by deforming the target object, thereby improving the display effect of the live broadcast interface and making the target object more prominent. Users who watch the live broadcast can pay attention to the target object in time with the live broadcast process of the live broadcast user, which improves the interest and effect of the live broadcast, which in turn helps to improve the user retention rate in the live broadcast room.

Description of drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent when taken in conjunction with the accompanying drawings and with reference to the following detailed description. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that the originals and elements are not necessarily drawn to scale.

FIG. 1 shows a schematic diagram of an implementation environment suitable for an embodiment of the present disclosure.

FIG. 2 shows a schematic flowchart of an image processing method provided by an embodiment of the present disclosure.

FIG. 3 shows a schematic flowchart of an image processing method provided by another embodiment of the present disclosure.

FIG. 4 shows a schematic diagram of a live broadcast interface provided by an exemplary embodiment of the present disclosure.

FIG. 5 shows a schematic flowchart of an image processing method provided by yet another embodiment of the present disclosure.

FIG. 6 shows a detailed flowchart of step S370 in FIG. 5 provided by an exemplary embodiment of the present disclosure.

FIG. 7 shows a schematic flowchart of determining a target object according to live broadcast content in an image processing method provided by an exemplary embodiment of the present disclosure.

FIG. 8 shows a block diagram of modules of an image processing apparatus provided by an embodiment of the present disclosure.

FIG. 9 shows a structural block diagram of an electronic device provided by an embodiment of the present disclosure.

Detailed ways

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for the purpose of A more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are only for exemplary purposes, and are not intended to limit the protection scope of the present disclosure.

It should be understood that the various steps described in the method embodiments of the present disclosure may be performed in different orders and/or in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this regard.

As used herein, the term "comprising" and variations thereof are open to include, i.e., "including but not limited to". The term "based on" is "based at least in part on." The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions of other terms will be given in the description below.

It should be noted that the concepts such as "first" and "second" mentioned in the present disclosure are only used to distinguish devices, modules or units, and are not used to limit these devices, modules or units to be different devices, modules or units. Units are not intended to limit the order or interdependence of the functions performed by these devices, modules or units.

It should be noted that the modifications of "a" and "a plurality" mentioned in the present disclosure are illustrative rather than restrictive, and those skilled in the art should understand that unless the context clearly indicates otherwise, they should be understood as "one or a plurality of". multiple".

The names of messages or information exchanged between multiple devices in the embodiments of the present disclosure are only for illustrative purposes, and are not intended to limit the scope of these messages or information.

The technical solutions of the present disclosure and how the technical solutions of the present disclosure solve the above-mentioned technical problems will be described in detail below with specific embodiments. The following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments. The embodiments of the present disclosure will be described below with reference to the accompanying drawings.

Please refer to FIG. 1 , which shows a schematic diagram of an implementation environment applicable to an embodiment of the present disclosure, where the implementation environment includes: a first terminal 120 and a second terminal 140 . in:

The first terminal 120 and the second terminal 140 may be a mobile phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, a moving image compression standard audio layer 3), an MP4 (Moving Picture Experts Group Audio Layer IV, a moving image compression standard audio layer) Level 4) Players, wearable devices, in-vehicle devices, Augmented Reality (AR)/Virtual Reality (VR) devices, laptops, Ultra-Mobile Personal Computers (UMPC), netbooks , a personal digital assistant (Personal digital assistant, PDA) or a special camera (such as a single-lens reflex camera, a card camera) and the like. The embodiment of the present disclosure does not limit the specific type of the terminal.

In addition, the first terminal 120 and the second terminal 140 may be two terminals of the same type, or may be two terminals of different types, which are not limited in this embodiment of the present disclosure.

The first terminal 120 and the second terminal 140 respectively run a first client and a second client. In one embodiment, the first client and the second client may both be live broadcast applications (Application, APP), and the first client may represent the host client used by the host user, and the first terminal 120 may represent the host user. The host terminal used; the second client terminal may represent the viewer client terminal used by the viewer user in the live room, and the second terminal 140 may represent the viewer terminal used by the viewer user.

The first terminal 120 and the second terminal 140 may be directly connected through a wired network or a wireless network. Alternatively, the implementation environment may further include a server 200, then the first terminal 120 may also be connected to the second terminal 140 through the server 200, and the server 200 may be connected to the first terminal 120 and the second terminal 140 respectively through a wired network or a wireless network , so that data interaction can be performed between the server 200 and the first terminal 120 and the second terminal 140 .

The server 200 may be a traditional server, a cloud server, a single server, a server cluster composed of several servers, or a cloud computing service center.

The image processing method, apparatus, electronic device, and computer-readable storage medium provided by the embodiments of the present disclosure will be described in detail below through specific embodiments.

Please refer to FIG. 2. FIG. 2 shows a schematic flowchart of an image processing method provided by an embodiment of the present disclosure, which can be applied to an electronic device, and the electronic device can be the above-mentioned first terminal or server. Taking the application to the first terminal (that is, the host terminal running the host client) as an example, the flow shown in FIG. 2 is described in detail below. The image processing method may include the following steps:

S110: Acquire live audio and video information collected in real time.

When a live broadcast user has a live broadcast requirement, a live broadcast request can be triggered based on the host client running on the host terminal. After the host client obtains the live broadcast request, it can start the image acquisition device and the audio acquisition device, and based on the image acquisition device and audio acquisition The device collects live audio and video information, and if the image acquisition device shoots a live broadcast user, the collected live broadcast audio and video information may include a user image of the live broadcast user. In an example, the display interface of the host client may display a control corresponding to the live broadcast portal, and by detecting a trigger operation acting on the control, the live broadcast request triggered by the live broadcast user may be obtained.

The host client is a live broadcast application that can be used for live broadcast. The image collection device can be a device that can collect image information, such as a camera, and the audio collection device can be a device that can collect audio information, such as a microphone. The connected external device is not limited in this embodiment.

The host terminal can collect live audio and video information based on the image acquisition device and the audio acquisition device, so as to obtain live audio and video information collected in real time. If the method is applied to the server, the host terminal can transmit the live audio and video information collected in real time to the server, so that the server can obtain the live audio and video information collected in real time.

S120: Identify the input information of the live broadcast user in the live broadcast audio and video information, and determine the target object to be processed in the live broadcast audio and video information according to the input information.

In some embodiments, the input information may include at least one of voice information, text information, touch information, and visual information. That is, the embodiment of the present disclosure does not limit the input form of the input information, which may be input by means of voice, touch operation, air gesture, or the like.

Depending on the input information, the corresponding target objects can be different. The target objects can be objects, or the whole or part of people, animals, plants, etc.

Among them, it should be noted that the visual information refers to the input information in the image information collected by the host terminal that can be used to determine the target object, for example, the image frame in which the live broadcast user performs a preset action, and the image frame can be the corresponding image frame in the live broadcast audio and video information. It can also be an image that only contains part of the image content in each video frame image. The preset actions may include actions in a narrow sense, and may also include actions in a broad sense such as gestures, expressions, and gestures, which are not limited herein. For another example, the visual information can also be a picture that can be used to indicate or characterize the target object. For example, if the target object is a lipstick, the visual information can also be a picture of the lipstick or its picture description information.

In some embodiments, when the input information includes visual information, the specific implementation of determining the target object to be processed in the live audio and video information according to the input information may be: if it is detected based on the image information of the live audio and video information that the live user performs a If an action is set, the object indicated by the preset action is determined as the target object to be processed. In this case, the video frame image corresponding to the preset action may be visual information.

In addition, if the input information is touch information, the live broadcast user's host terminal can display the live broadcast interface, the live broadcast interface can display live broadcast audio and video information, and display other live broadcast content, such as object information superimposed on the live broadcast audio and video information. For example, any one or more of names, models, pictures, links, etc., the live broadcast user can click on the object information, so that after the anchor terminal detects the click event, the corresponding object information is obtained as the content corresponding to the input information or the recognition result. , to determine the corresponding target object.

In practical applications, the live broadcast user can input various forms of input information during the live broadcast, such as at least one of voice information, text information, touch information, and visual information. For example, the live broadcast user can input voice information by speaking, input by typing Text information, input visual information by performing preset actions, etc. In some examples, the content of the identified input information can be any one or more of the item name, style, model, item picture, etc., and the target object to be processed can be determined in the live audio and video information according to the input information.

In some embodiments, the electronic device can detect the target object corresponding to the recognition result in the live audio and video information according to the recognition result of the input information. As an embodiment, the input information of the live user in the live audio and video information is identified, and the identification result can be the item name, then the feature vector description corresponding to the item name can be obtained, and described in the live audio and video information according to the feature vector. To determine the corresponding target object, for example, the feature vector can be marked in the live audio and video information to describe the corresponding image area and the corresponding image can be used as the target image of the target object. Since the input information of the live broadcast user may include things that the live broadcast user needs to introduce or describe to other users (such as audience users in the live broadcast room), that is, the target object in the embodiment of the present disclosure, the input information can be identified in the live broadcast audio and video information. Identify the target object for subsequent processing.

It should be noted that when the corresponding target object is determined in the live audio and video information according to the feature vector description, it may be an incomplete match. For example, if the matching degree reaches a specified ratio, it can be considered a match, and it is determined that the feature vector exists in the live audio and video information. Describe the corresponding object, and mark the image area where the object is located, as described above.

In some embodiments, the electronic device may be pre-built with a picture feature vector set, the picture feature vector set includes feature vector descriptions corresponding to various objects, and may be a complete set of a series of commodity data obtained through machine learning in the background. Specifically, the product pictures related to an object on the network can be integrated, and the feature vector description corresponding to the object can be obtained through machine learning and feature extraction, so as to quickly lock the object in the live audio and video information. The feature vector description corresponding to an object may include at least one of a shape feature vector, a texture feature vector, and a color feature vector.

The picture feature vector set can be stored locally on the host terminal or on the server, and when it is stored on the server and the execution subject of this method is the host terminal, the server can find the corresponding feature vector description according to the input information, and then store it in the server. The feature vector description is delivered to the host terminal, so that the host terminal can obtain the feature vector description to determine the corresponding object, that is, the target object, in the live audio and video information.

In addition, the identification of the input information can be performed locally on the electronic device or implemented through a network. For example, it can be sent to a server based on the network, and the server identifies the input information. This embodiment does not limit the identification method.

S130: Perform deformation processing on a target image corresponding to the target object in the live audio and video information to obtain a deformed target image.

The deformation processing may include at least one of enlargement processing, distortion processing, stretching processing, and fisheye special effect processing. The specific implementation of the deformation processing is not limited in this embodiment, and can be determined according to actual needs.

After determining the target object to be processed, the electronic device can perform deformation processing on the target image corresponding to the target object in the live audio and video information to obtain a deformed target image, so that the display effect of the target object in the live audio and video information changes. . Then, compared to other information with small changes and even almost unchanged backgrounds, the transformation of the target object from before deformation to after deformation on the live broadcast interface brings a stronger sense of impact, so that users can more easily Pay attention to the target object, the target object can be highlighted during the live video broadcast, which can improve the user's attention to the target object in the live broadcast room, and because the target object is determined by the input information of the live broadcast user, so by highlighting the target object can make The input information of the live broadcast user and the live broadcast content are more closely related, which is conducive to improving the efficiency and effect of the live broadcast.

Of course, through different deformation processing, corresponding different effects can also be achieved. For example, if the deformation processing is a fisheye effect processing. Then the effect of the fisheye lens can be simulated, and the target image can be transformed into the image seen after adding the fisheye lens, which can not only increase the interest of the live broadcast, but also enrich the live broadcast effect.

For another example, if the deformation processing is enlargement processing, the specific implementation of step S130 may be: performing enlargement processing on the target image corresponding to the target object in the live audio and video information to obtain the enlarged target image as the transformed target image. Therefore, when the live broadcast user mentions the target object, the image size of the target image corresponding to the target object can be enlarged, so that the target object can be displayed more friendly, so that the audience user can observe and understand the target object more clearly through the live broadcast interface, which is beneficial to Improve live performance. Exemplarily, in an e-commerce live broadcast scenario, if the live broadcast user needs to introduce a product to the audience user, the image corresponding to the product can be enlarged through the embodiment of the present disclosure, so that the product features can be displayed more amicably, so that the audience user can By observing the product more clearly, you can better understand the product in combination with the live broadcast user's explanation, which can greatly improve the efficiency and effect of product recommendation.

The target image corresponding to the target object may only include the target object, or may include information other than the target object, which is not limited herein. As an embodiment, the electronic device can determine the image area where the target object is located in the live audio and video information, and intercept the image of the image area to obtain the intercepted image, perform front and background separation on the intercepted image, and extract the target as the foreground object. The image of the object is used as the target image, so that matting can be realized, so that the target image to be deformed can only contain the target object, which is beneficial to obtain a more natural effect during subsequent synthesis.

The entire image selected in the image area where it is located is subjected to deformation processing. The shape of the image area can be a circle, a rectangle, a fan shape, etc., which is not limited here. It is determined by the shape of the device, which is not limited here.

For example, the live broadcast user can input the name of the target object "lipstick", then the host terminal can obtain the feature vector description corresponding to the "lipstick", and mark the item area where the "lipstick" is located in the live audio and video information based on the feature vector description, and then The image corresponding to the item area is taken as the target image corresponding to the target object, and the target image is deformed to obtain the deformed target image.

S140: Perform synthesis processing on the deformed target image and the image in the live audio and video information to obtain synthesized audio and video information, which is used for playing the synthesized audio and video information.

After the deformed target image is obtained, the deformed target image and the image in the currently collected live audio and video information can be synthesized to obtain synthesized audio and video information, which is used for playing the synthesized audio and video information. Then if this method is applied to the anchor terminal, when the anchor terminal obtains the synthesized audio and video information, it can play the synthesized audio and video information, and/or send the synthesized audio and video information to the terminal of the user in the live broadcast room, for example, through the server. Send to the audience terminal of the live room; if the method is applied to the server, the server can send the synthesized audio and video information to the terminal of the user of the live room, including at least one of the host terminal and the audience terminal, so that at least one of the host terminal and the audience terminal Play composite audio and video information.

In some embodiments, the image position of the target object can be determined in the currently collected live audio and video information, and the deformed target image can be superimposed on the image position, so that the deformed target image can be displayed corresponding to the image position, For example, the target object in the live audio and video information can be covered. In other embodiments, the synthesis processing is also performed corresponding to any other position, which is not limited herein.

The image processing method provided by this embodiment acquires the live audio and video information collected in real time, then identifies the input information of the live user in the live audio and video information, and determines the target object to be processed in the live audio and video information according to the input information, and then Perform deformation processing on the target image corresponding to the target object in the live audio and video information to obtain the deformed target image, and perform synthesis processing on the deformed target image and the image in the live audio and video information to obtain the synthesized audio and video information. Used to play synthetic audio and video information. As a result, the embodiments of the present disclosure can obtain input information during the live video broadcast of the live user to determine the target object to be processed, and effectively highlight the target object by deforming the target object, thereby improving the display effect of the live broadcast interface. This enables users who watch the live broadcast to pay attention to the target object in a timely manner along with the live broadcast process of the live broadcast user, which improves the interest and effect of the live broadcast, and further helps to improve the user retention rate in the live broadcast room.

In some embodiments, the input information may include voice information, and the electronic device may find the target object to be deformed in the live audio and video information according to the voice information input by the live broadcast user, thereby simplifying the operation of the live broadcast user without the need for the live broadcast user If you give special instructions, you can automatically lock the target object and deform it, which greatly improves the live broadcast efficiency and live broadcast effect. Specifically, please refer to FIG. 3, which shows a schematic flowchart of an image processing method provided by another embodiment of the present disclosure. The image processing method may include:

S210: Acquire live audio and video information collected in real time.

S220: Perform voice recognition on the voice information in the live audio and video information to obtain a voice recognition result.

In some implementations, the speech recognition model can be run on the host terminal or the server, which is not limited here. Based on the pre-trained speech recognition model, the electronic device can perform speech recognition on the speech information in the live audio and video information, and obtain a speech recognition result.

S230: Determine the object information to be processed based on the speech recognition result.

The object information to be processed may be information such as the name and identifier of the object that can describe the target object. Wherein, the identifier may also include a link corresponding to the object (click on the link to view at least one of the object-related information and the purchase portal), for example, if the text indicated by the speech recognition result is "lipstick", that is, the name of the object. Of course, the speech recognition result can also be more specific information including style, model, etc. that can determine a unique object. For example, "Armani Lipstick 301" including the object type, brand, and color number, the information corresponding to the object to be processed is more specific, according to The feature vector description that can be found in the object information to be processed is more accurate, which is more conducive to accurately determining the target object in the live audio and video information.

S240: In the live audio and video information, determine the object corresponding to the object information to be processed as the target object.

In some embodiments, the feature vector description corresponding to the object information to be processed can be obtained. For example, the electronic device can pre-build a mapping relationship between the object information and the feature vector description, and then according to the object information to be processed, the corresponding to determine the corresponding object in the live audio and video information according to the feature vector description, and determine the object as the target object. The specific methods can be found in the corresponding parts of the foregoing embodiments, which will not be repeated here.

In some embodiments, step S240 may include: if it is detected that the first object indicated by the object information to be processed exists in the live audio and video information, determining the first object as the target object. As a method, after determining the object information to be processed, the electronic device can detect whether there is a first object indicated by the object information to be processed in the live audio and video information, and if there is, it can change the first object to determine the target object . The method of detecting whether it exists can be obtained by obtaining the feature vector description corresponding to the information of the object to be processed, and then matching the live audio and video information based on the feature vector description. There is a first object indicated by the object information to be processed in the audio and video information, and the first object is determined as the target object.

S250: Perform deformation processing on a target image corresponding to the target object in the live audio and video information to obtain a deformed target image.

S260: Perform synthesis processing on the deformed target image and the image in the live audio and video information to obtain synthesized audio and video information, which is used for playing the synthesized audio and video information.

In some embodiments, the deformation processing may be enlargement processing, and the specific implementation of step S250 may be: performing enlargement processing on the target image corresponding to the target object in the live audio and video information, and obtaining the enlarged target image as the deformed image. target image.

In an exemplary embodiment, the enlargement process can be realized by opencv. During the live video broadcast, the image frames are composed according to the sequence of the time dimension. By analyzing the binary file of each frame, combined with the feature vector description corresponding to the object information to be processed, that is A two-dimensional vector feature that anchors a specific image area on the image, which can be a rectangle (xstart, ystart, xend, yend), or other irregular images with multiple vertices, which are not limited here. Take the anchor area graphic as a rectangle as an example, and then use the prototype function to enlarge the anchor area graphic and output the binary data stream. The code example is as follows:

V_EXPORTS_W void resize(InputArray src,OutputArray dst,

Size dsize, double fx=0, double fy=0,

int interpolation=INTER_LINEAR);

Then superimpose the obtained binary data stream to the specified position of the live audio and video information of the current frame. At this time, the size of the anchor area rectangle (xstart, ystart, xend, yend) is enlarged to n times of the original, and then synthesized, using The roi method overlays the binary data stream (item graphics) on the anchor position of the live audio and video information of the current frame (which can be the position where the original anchor area graphics is located). At this point, an enlargement process is realized.

Of course, the above is only an example, and the embodiment of the present disclosure is not limited to the above-mentioned one implementation manner.

In an exemplary scenario, please refer to FIG. 4 , which shows a schematic diagram of a live broadcast interface provided by an exemplary embodiment of the present disclosure. If the live broadcast user, such as the anchor Zhang San, introduces a doughnut, the live broadcast interface will display The live audio and video information 410 obtains the information of the object to be processed by identifying the voice information of the anchor Zhang San. If the item name is "doughnut", then at time t, the electronic device can obtain the feature vector description corresponding to the donut, and then in the live audio and video Find the image area 411 where the "doughnut" is marked with the feature vector description in the information 410, and then after the image of the "doughnut" is enlarged, the enlarged "doughnut" image is superimposed on the image area 411. If the position corresponding to the image area 411 covers the original image area 411, at time t+1, the live broadcast audio and video information 420 of the current frame is displayed on the live broadcast interface of anchor Zhang San, and the enlarged image area 421 is displayed on it. 'Donuts' image. As a result, the electronic device can automatically identify the object introduced by the live broadcast user when the live broadcast user introduces the doughnut, and lock the object in the live broadcast audio and video information to zoom in, so that the live broadcast room user can listen to the live broadcast user introducing the doughnut. , while seeing the magnified donut, you can observe the donut more carefully, and get a better e-commerce live broadcast experience, so that users can fully understand the objects introduced by live broadcast users.

It should be noted that, for the parts that are not described in detail in this embodiment, please refer to the foregoing embodiment, and details are not repeated here.

In addition, in some embodiments, the electronic device may not be able to detect the object indicated by the object information to be processed in the live audio and video information, and thus cannot determine the target object in the live audio and video information. It is further determined, thereby reducing the missed detection rate and improving the system stability. Specifically, please refer to FIG. 5, which shows a schematic flowchart of an image processing method provided by another embodiment of the present disclosure. The method may include:

S310: Acquire live audio and video information collected in real time.

S320: Perform voice recognition on the voice information in the live audio and video information to obtain a voice recognition result.

S330: Determine the object information to be processed based on the speech recognition result.

S340: Determine whether it is detected that the first object indicated by the object information to be processed exists in the live audio and video information.

In this embodiment, after judging whether the first object indicated by the object information to be processed exists in the live audio and video information is detected, the method may include:

If it is detected that the first object indicated by the object information to be processed exists in the live audio and video information, step S350 can be executed;

If it is not detected that the first object indicated by the object information to be processed exists in the live audio and video information, step S360 may be executed.

S350: Determine the first object as the target object.

If it is detected that the first object indicated by the object information to be processed exists in the live audio and video information, the first object is determined as the target object.

S360: Perform image recognition processing on the live audio and video information.

Because in some embodiments, in order to reduce the amount of stored data for the feature vector description, the pre-stored feature vector description may be general. The live broadcast user said "lipstick", and the lipstick in the live audio and video information does not look like a regular lipstick), that is, it does not match the feature vector description (the matching degree is lower than the specified ratio), and the live broadcast user only said the object If there is no other information that can be used to search the corresponding picture from the Internet for matching, such as the specific brand model of the object, etc., it may not be possible to determine the corresponding target object in the live audio and video information according to the feature vector description, which may lead to missed inspections. At this time, other information can be combined to further determine the target object, thereby reducing the missed detection rate. Then, if it is not detected that the first object indicated by the object information to be processed exists in the live audio and video information, image recognition processing may be performed on the live audio and video information.

S370: If it is recognized that there is a preset gesture in the live audio and video information, the second object indicated by the preset gesture is used as the target object.

In some embodiments, if it is not detected that the first object indicated by the object information to be processed exists in the live audio and video information, image recognition processing can be performed on the live audio and video information to identify whether there is a preset gesture, and if so, the The second object indicated by the preset gesture is further used as the target object. The preset gesture can be one or more pre-stored gestures, which is not limited here, and can be set according to actual needs. For example, the preset gesture can be a circle with a finger, and the object that can be circled is the The second object indicated by the preset gesture; for another example, the preset gesture can also be four fingers together and only one finger is extended, then the object pointed to by the finger can be used as the second object indicated by the preset gesture . Therefore, when it is not detected that the first object indicated by the object information to be processed exists in the live audio and video information, the gesture of the live broadcast user can be used to further determine that the second object indicated by the preset gesture is the target object. Because the object that the user refers to and circles is usually what the user is describing, or even wants to highlight, this embodiment can more accurately determine the object in the live audio and video information, and can reduce the need to use only voice information to perform The missed detection rate that may be brought when it is determined.

In addition, in some embodiments, the second object may not match the input information of the live broadcast user, so it may not be the target object that the live broadcast user currently wants to highlight. In order to further improve the accuracy of re-determining the target object, After the indicated second object, the second object can be matched with the to-be-processed object information based on the to-be-processed object information indicated by the voice information, and only when the matching is successful, the second object is used as the target object. Specifically, please refer to FIG. 6, which shows a detailed flowchart of step S370 in FIG. 5 provided by an exemplary embodiment of the present disclosure. In this embodiment, step S370 may include:

S371: If it is recognized that there is a preset gesture in the live audio and video information, determine the object indicated by the preset gesture as the second object.

S372: If the second object matches the to-be-processed object information, use the second object as the target object.

In some embodiments, the image area indicated by the preset gesture can be determined, and the image of the image area can be intercepted to obtain the second image corresponding to the second object, and then the second image corresponding to the second image can be searched through the network. For the second object information, if the second object information matches the to-be-processed object information, it can be determined that the second object matches the to-be-processed object information, and the second object can be used as the target object. For example, intercept the image of the image area that the live broadcast user points and circles with his hand as the second image corresponding to the second object. Determined as the target object to be deformed in a subsequent step.

S380: Perform deformation processing on a target image corresponding to the target object in the live audio and video information to obtain a deformed target image.

S390: Perform synthesis processing on the deformed target image and the image in the live audio and video information to obtain synthesized audio and video information, which is used for playing the synthesized audio and video information.

In addition, in some embodiments, if it is not detected that the first object indicated by the object information to be processed exists in the live audio and video information, the target object can be further determined according to the live content displayed on the live broadcast interface, and the target object is displayed on the live broadcast interface. When there is target information that can be used to indicate the target object, the target object can be determined based on the target information. Specifically, please refer to FIG. 7 , which shows a schematic flowchart of determining a target object according to live content in an image processing method provided by an exemplary embodiment of the present disclosure, which may specifically include:

S410: If it is not detected that the first object indicated by the object information to be processed exists in the live audio and video information, obtain the currently displayed live content.

S420: If there is target information corresponding to the object information to be processed in the live broadcast content, determine the target object according to the target information.

The target information may include at least one of an item identifier corresponding to the target object and an item image. When it fails to detect that the first object indicated by the object information to be processed exists in the live audio and video information, the currently displayed live content can be identified, and the target object can be determined according to the identified target information, so that the target can be improved. Object recognition rate.

In some embodiments, the live interface may display at least one kind of target information corresponding to the object information to be processed, such as an item identifier, an item image, a purchase portal, etc., and then the corresponding target can be found in the live audio and video information according to the target information. object. The purchase portal may be displayed in the form of an item image, and the item image may have a built-in URL (Uniform Resource Locator, URL), and the user can click the item image to jump to the purchase page corresponding to the URL.

As an embodiment, a product image and product name can be displayed in the live broadcast interface. If the product name matches the information of the object to be processed, for example, the object information to be processed is the name of the object "lipstick", and the product name also includes "lipstick" , it can be considered a match, and the corresponding image area is marked in the live audio and video information based on the product image, and the object in the image area is the target object.

It should be noted that, based on the foregoing embodiment, steps S410-S420 may be used to replace steps S360-S370 in FIG. 5 . When it is not detected that the first object indicated by the object information to be processed exists in the live audio and video information, according to the live broadcast content to further identify the target audience.

Please refer to FIG. 8 , which is a block diagram of an image processing apparatus provided by an embodiment of the present disclosure. The image processing apparatus 800 in the embodiment of the present disclosure may include: an information acquisition module 810 , a target determination module 820 , a target deformation module 830 , and an image synthesis module Module 840, where:

An information acquisition module 810, configured to acquire live audio and video information collected in real time;

The target determination module 820 is used to identify the input information of the live user in the live audio and video information, and determine the target object to be processed in the live audio and video information according to the input information;

The target deformation module 830 is configured to perform deformation processing on the target image corresponding to the target object in the live audio and video information to obtain the deformed target image;

The image synthesis module 840 is used for synthesizing the deformed target image and the image in the live audio and video information to obtain the synthesized audio and video information, which is used for playing the synthesized audio and video information.

In one embodiment, the input information includes voice information, and the target determination module 820 may include: a voice recognition submodule, an object information determination submodule, and a target object determination submodule, wherein:

The speech recognition sub-module is used to perform speech recognition on the speech information in the live audio and video information to obtain the speech recognition result;

an object information determination submodule, used for determining the object information to be processed based on the speech recognition result;

The target object determination submodule is used for determining the object corresponding to the object information to be processed as the target object in the live audio and video information.

In one embodiment, the target object determination sub-module may include: a first object determination unit, configured to determine the first object as the target if it is detected that the first object indicated by the object information to be processed exists in the live audio and video information. object.

In one embodiment, the target object determination submodule may include: an image recognition unit and a gesture determination unit, wherein:

an image recognition unit, configured to perform image recognition processing on the live audio and video information if it is not detected that the first object indicated by the to-be-processed object information exists in the live audio and video information;

The gesture determination unit is configured to use the second object indicated by the preset gesture as the target object if it is recognized that there is a preset gesture in the live audio and video information.

In one embodiment, the gesture determination unit may include: a second object determination subunit and a target object determination subunit, wherein:

The second object determination subunit is configured to determine the object indicated by the preset gesture as the second object if it is recognized that there is a preset gesture in the live audio and video information;

The target object determination subunit is configured to use the second object as the target object if the second object matches the information of the object to be processed.

In one embodiment, the target object determination submodule may include: a live content acquisition unit and a target information determination unit, wherein:

A live broadcast content acquisition unit, configured to acquire the currently displayed live broadcast content if it is not detected that the first object indicated by the object information to be processed exists in the live broadcast audio and video information;

The target information determining unit is configured to determine the target object according to the target information if target information corresponding to the object information to be processed exists in the live broadcast content, and the target information includes at least one of an item identifier and an image corresponding to the target object.

In one embodiment, the target deformation module 830 may include: an enlargement processing sub-module, configured to perform enlargement processing on the target image corresponding to the target object in the live audio and video information, and obtain the enlarged target image as the deformed target image.

In one embodiment, the input information includes at least one of voice information, text information, touch information, and visual information.

The image processing apparatus in the embodiments of the present disclosure can execute an image processing method provided by the embodiments of the present disclosure, and the implementation principle is similar. The actions performed by each module in the image processing apparatus in the embodiments of the present disclosure are the same as Corresponding to the steps in the image processing methods in the embodiments of the present disclosure, for the detailed functional description of each module of the image processing apparatus, please refer to the descriptions in the corresponding image processing methods shown above, which will not be repeated here. .

Referring next to FIG. 9 , it shows a structural block diagram of an electronic device 900 suitable for implementing embodiments of the present disclosure. The electronic device in the embodiment of the present disclosure may include, but is not limited to, a device such as a computer. The electronic device shown in FIG. 9 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present disclosure.

The electronic device 900 includes: a memory and a processor, where the processor here may be referred to as a processing device 901 hereinafter, and the memory may include a read-only memory (ROM) 902, a random access memory (RAM) 903, and a storage device 908 hereinafter At least one of the following:

As shown in FIG. 9 , an electronic device 900 may include a processing device (eg, a central processing unit, a graphics processor, etc.) 901 that may be loaded into random access according to a program stored in a read only memory (ROM) 902 or from a storage device 908 Various appropriate actions and processes are executed by the programs in the memory (RAM) 903 . In the RAM 903, various programs and data necessary for the operation of the electronic device 900 are also stored. The processing device 901, the ROM 902, and the RAM 903 are connected to each other through a bus 904. An input/output (I/O) interface 905 is also connected to bus 904 .

Typically, the following devices can be connected to the I/O interface 905: input devices 906 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration An output device 907 such as a computer; a storage device 908 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 909 . The communication means 909 may allow the electronic device 900 to communicate wirelessly or by wire with other devices to exchange data. While Figure 9 shows electronic device 900 having various means, it should be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer-readable storage medium, the computer program containing program code for performing the method illustrated in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network via the communication device 909, or from the storage device 908, or from the ROM 902. When the computer program is executed by the processing apparatus 901, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed.

It should be noted that the above-mentioned computer-readable storage medium of the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. In this disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present disclosure, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable storage medium, other than a computer-readable storage medium, that can send, propagate, or transport a computer-readable signal medium for use by or in connection with the instruction execution system, apparatus, or device. program. Program code embodied on a computer-readable storage medium may be transmitted using any suitable medium including, but not limited to, electrical wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, clients and servers can communicate using any currently known or future developed network protocols such as HTTP (HyperText Transfer Protocol), and can communicate with digital data in any form or medium. Communications (eg, communications networks) are interconnected. Examples of communication networks include local area networks ("LAN"), wide area networks ("WAN"), the Internet (eg, the Internet), and peer-to-peer networks (eg, ad hoc peer-to-peer networks), as well as any currently known or future development network of.

The above-mentioned computer-readable storage medium may be included in the above-mentioned electronic device; or may exist alone without being assembled into the electronic device.

The above-mentioned computer-readable storage medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device is made to perform the following steps: acquiring live audio and video information collected in real time; identifying the live audio and video information The input information of the live broadcast user in the video information, and the target object to be processed is determined in the live broadcast audio and video information according to the input information; the target image corresponding to the target object in the live broadcast audio and video information is deformed to obtain The deformed target image; performing synthesis processing on the deformed target image and the image in the live audio and video information to obtain synthesized audio and video information, which is used for playing the synthesized audio and video information.

Computer program code for performing operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and This includes conventional procedural programming languages - such as the "C" language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or operations. , or can be implemented in a combination of dedicated hardware and computer instructions.

The modules or units involved in the embodiments of the present disclosure may be implemented in software or hardware. Wherein, the name of the module or unit does not constitute a limitation of the unit itself under certain circumstances, for example, the display module can also be described as "a module for displaying a resource uploading interface".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), Systems on Chips (SOCs), Complex Programmable Logical Devices (CPLDs) and more.

In the context of this disclosure, a computer-readable storage medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device. The computer-readable storage medium may be a machine-readable signal medium or a machine-readable storage medium. Computer-readable storage media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.

According to one or more embodiments of the present disclosure, an image processing method is provided, the method includes: acquiring live audio and video information collected in real time; identifying input information of a live user in the live audio and video information, and The input information determines the target object to be processed in the live audio and video information; performs deformation processing on the target image corresponding to the target object in the live audio and video information to obtain a deformed target image; The image is synthesized with the image in the live audio and video information to obtain synthesized audio and video information, which is used for playing the synthesized audio and video information.

In one embodiment, the input information includes voice information, the identifying the input information of the live user in the live audio and video information, and determining the target object to be processed in the live audio and video information according to the input information, including: : perform voice recognition on the voice information in the live audio and video information to obtain a voice recognition result; determine the object information to be processed based on the voice recognition result; in the live audio and video information, map the object information to be processed corresponding to The object is determined as the target object.

In an embodiment, determining the object corresponding to the object information to be processed as the target object in the live audio and video information includes: if it is detected that the to-be-processed audio and video information exists in the live audio and video information; The first object indicated by the object information is processed, and the first object is determined as the target object.

In an embodiment, determining the object corresponding to the object information to be processed as the target object in the live audio and video information, further comprising: if it is not detected that there is any object in the live audio and video information If the first object indicated by the object information to be processed is detected, image recognition processing is performed on the live audio and video information; if it is recognized that there is a preset gesture in the live audio and video information, the The second object serves as the target object.

In one embodiment, if it is recognized that there is a preset gesture in the live audio and video information, the second object indicated by the preset gesture is used as the target object, including: if the live broadcast is recognized If there is a preset gesture in the audio and video information, the object indicated by the preset gesture is determined as the second object; if the second object matches the to-be-processed object information, the second object is determined as the second object. describe the target object.

In an embodiment, determining the object corresponding to the object information to be processed as the target object in the live audio and video information, further comprising: if it is not detected that there is any object in the live audio and video information If the first object indicated by the object information to be processed is obtained, the currently displayed live broadcast content is obtained; if there is target information corresponding to the object information to be processed in the live broadcast content, the target object is determined according to the target information, The target information includes at least one of an item identifier and an image corresponding to the target object.

In one embodiment, performing deformation processing on the target image corresponding to the target object in the live audio and video information to obtain the deformed target image includes: performing deformation processing on the target object in the live audio and video information. The corresponding target image in the image is enlarged, and the enlarged target image is obtained as the deformed target image.

According to one or more embodiments of the present disclosure, an image processing apparatus is provided, the image processing apparatus includes: an information acquisition module for acquiring live audio and video information collected in real time; a target determination module for identifying the live broadcast The input information of the live broadcast user in the audio and video information, and the target object to be processed is determined in the live audio and video information according to the input information; the target deformation module is used for the corresponding target object in the live audio and video information. The target image is deformed to obtain a deformed target image; an image synthesis module is used to synthesize the deformed target image and the image in the live audio and video information to obtain synthesized audio and video information for use in Play the synthesized audio and video information.

In one embodiment, the input information includes voice information, and the target determination module may include: a voice recognition sub-module, an object information determination sub-module, and a target object determination sub-module, wherein: a voice recognition sub-module is used for the live broadcast. The voice information in the audio and video information is subjected to voice recognition, and a voice recognition result is obtained; an object information determination sub-module is used to determine the object information to be processed based on the voice recognition result; a target object determination sub-module is used for the live audio and video. In the information, the object corresponding to the to-be-processed object information is determined as the target object.

In an embodiment, the target object determination sub-module may include: a first object determination unit, configured to, if it is detected that the first object indicated by the to-be-processed object information exists in the live audio and video information, determine the The first object is determined as the target object.

In one embodiment, the target object determination sub-module may include: an image recognition unit and a gesture determination unit, wherein: the image recognition unit is configured to, if it is not detected that the live audio and video information exists as indicated by the to-be-processed object information the first object of the live audio and video information, then perform image recognition processing on the live audio and video information; the gesture determination unit is configured to recognize that there is a preset gesture in the live audio and video information, then identify the first object indicated by the preset gesture Two objects are used as the target object.

In one embodiment, the gesture determination unit may include: a second object determination subunit and a target object determination subunit, wherein: the second object determination subunit is used for if it is recognized that there is a preset gesture in the live audio and video information , the object indicated by the preset gesture is determined as the second object; the target object determination subunit is configured to, if the second object matches the to-be-processed object information, determine the second object as the object to be processed describe the target object.

In one embodiment, the target object determination sub-module may include: a live broadcast content acquisition unit and a target information determination unit, wherein: the live broadcast content acquisition unit is used for, if it is not detected that the object to be processed exists in the live broadcast audio and video information The first object indicated by the information, obtain the currently displayed live content; a target information determination unit is configured to determine the target information according to the target information if there is target information corresponding to the object information to be processed in the live broadcast content A target object, the target information includes at least one of an item identifier and an image corresponding to the target object.

In one embodiment, the target deformation module may include: an enlargement processing sub-module, configured to perform enlargement processing on the target image corresponding to the target object in the live audio and video information, and obtain the enlarged target image as the deformation. post target image.

The above description is merely a preferred embodiment of the present disclosure and an illustration of the technical principles employed. Those skilled in the art should understand that the scope of the disclosure involved in the present disclosure is not limited to the technical solutions formed by the specific combination of the above-mentioned technical features, and should also cover, without departing from the above-mentioned disclosed concept, the technical solutions formed by the above-mentioned technical features or Other technical solutions formed by any combination of its equivalent features. For example, a technical solution is formed by replacing the above features with the technical features disclosed in the present disclosure (but not limited to) with similar functions.

Additionally, although operations are depicted in a particular order, this should not be construed as requiring that the operations be performed in the particular order shown or in a sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, although the above discussion contains several implementation-specific details, these should not be construed as limitations on the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Although the subject matter has been described in language specific to structural features and/or logical acts of method, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are merely example forms of implementing the claims.

Claims

An image processing method, comprising:

Obtain live audio and video information collected in real time;

Identify the input information of the live user in the live audio and video information, and determine the target object to be processed in the live audio and video information according to the input information;

Perform deformation processing on the target image corresponding to the target object in the live audio and video information to obtain a deformed target image;

The deformed target image and the image in the live audio and video information are synthesized to obtain synthesized audio and video information, which is used for playing the synthesized audio and video information.
The method according to claim 1, wherein the input information includes voice information, and the identifying the input information of the live user in the live audio and video information, and determining in the live audio and video information according to the input information Target objects to be processed, including:

Perform voice recognition on the voice information in the live audio and video information to obtain a voice recognition result;

Determine the object information to be processed based on the speech recognition result;

In the live audio and video information, the object corresponding to the to-be-processed object information is determined as the target object.
The method according to claim 2, wherein, in the live audio and video information, determining the object corresponding to the to-be-processed object information as the target object, comprising:

If it is detected that the first object indicated by the to-be-processed object information exists in the live audio and video information, the first object is determined as the target object.
The method according to claim 2, wherein, in the live audio and video information, determining the object corresponding to the to-be-processed object information as the target object, further comprising:

If it is not detected that the first object indicated by the object information to be processed exists in the live audio and video information, image recognition processing is performed on the live audio and video information;

If it is recognized that there is a preset gesture in the live audio and video information, the second object indicated by the preset gesture is used as the target object.
The image processing method according to claim 4, wherein if it is recognized that there is a preset gesture in the live audio and video information, the second object indicated by the preset gesture is used as a target to be processed objects, including:

If it is recognized that there is a preset gesture in the live audio and video information, the object indicated by the preset gesture is determined as the second object;

If the second object matches the to-be-processed object information, the second object is used as the target object.
The image processing method according to claim 2, wherein, in the live audio and video information, determining the object corresponding to the to-be-processed object information as the target object, further comprising:

If it is not detected that the first object indicated by the to-be-processed object information exists in the live audio and video information, obtain the currently displayed live content;

If there is target information corresponding to the object information to be processed in the live broadcast content, the target object is determined according to the target information, and the target information includes at least one of an item identifier and an image corresponding to the target object .
The image processing method according to any one of claims 1 to 6, characterized in that, performing deformation processing on the target image corresponding to the target object in the live audio and video information to obtain the deformed target image, include:

Enlarging the target image corresponding to the target object in the live audio and video information to obtain the enlarged target image as the deformed target image.
The image processing method according to claim 1, wherein the input information includes at least one of voice information, text information, touch information, and visual information.
An image processing device, comprising:

The information acquisition module is used to acquire the live audio and video information collected in real time;

A target determination module, configured to identify the input information of the live user in the live audio and video information, and determine the target object to be processed in the live audio and video information according to the input information;

a target deformation module, configured to perform deformation processing on the target image corresponding to the target object in the live audio and video information to obtain a deformed target image;

An image synthesis module, configured to perform synthesis processing on the deformed target image and the image in the live audio and video information to obtain synthesized audio and video information for playing the synthesized audio and video information.
An electronic device, comprising:

one or more processors;

memory;

one or more computer programs, wherein the one or more computer programs are stored in the memory and configured to be executed by the one or more processors, the one or more computer programs are configured to : Execute the image processing method according to any one of claims 1-8.
A computer-readable storage medium, wherein the computer-readable storage medium is used to store a computer program, and the computer program is invoked by a processor to execute the image processing method according to any one of claims 1-8 .