WO2022083383A1 - Image processing method and apparatus, electronic device and computer-readable storage medium - Google Patents

Image processing method and apparatus, electronic device and computer-readable storage medium Download PDF

Info

Publication number
WO2022083383A1
WO2022083383A1 PCT/CN2021/119567 CN2021119567W WO2022083383A1 WO 2022083383 A1 WO2022083383 A1 WO 2022083383A1 CN 2021119567 W CN2021119567 W CN 2021119567W WO 2022083383 A1 WO2022083383 A1 WO 2022083383A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
video information
target
image
live
Prior art date
Application number
PCT/CN2021/119567
Other languages
French (fr)
Chinese (zh)
Inventor
王岩
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Publication of WO2022083383A1 publication Critical patent/WO2022083383A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • G06V40/113Recognition of static hand signs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23424Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47205End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/858Linking data to content, e.g. by linking an URL to a video object, by creating a hotspot
    • H04N21/8586Linking data to content, e.g. by linking an URL to a video object, by creating a hotspot by using a URL

Definitions

  • the present disclosure relates to the technical field of image processing, and in particular, to an image processing method, apparatus, electronic device, and computer-readable storage medium.
  • an embodiment of the present disclosure provides an image processing device, the device includes: an information acquisition module, configured to acquire live audio and video information collected in real time; a target determination module, configured to identify the live audio and video information in the live broadcast information.
  • the input information of the user, and the target object to be processed is determined in the live audio and video information according to the input information;
  • the target deformation module is used to deform the target image corresponding to the target object in the live audio and video information. , obtain the deformed target image;
  • the image synthesis module is used for synthesizing the deformed target image and the image in the live audio and video information to obtain synthesized audio and video information, which is used for playing the synthesized audio video information.
  • embodiments of the present disclosure provide an electronic device, the electronic device comprising: one or more computer programs, wherein the one or more computer programs are stored in the memory and configured to be executed by The one or more processors execute, the one or more computer programs configured to: perform the method of the first aspect above.
  • an embodiment of the present disclosure provides a computer-readable storage medium on which a computer program is stored, and when the computer program is invoked and executed by a processor, the method described in the first aspect above is implemented.
  • An image processing method, device, electronic device, and computer-readable storage medium provided by the embodiments of the present disclosure, by acquiring live audio and video information collected in real time, and then identifying the input information of the live user in the live audio and video information, and according to the input information Determine the target object to be processed in the live audio and video information, and then perform deformation processing on the target image corresponding to the target object in the live audio and video information to obtain the deformed target image, and compare the deformed target image and the live audio and video information. Synthesize the images in the device to obtain synthesized audio and video information, which is used to play the synthesized audio and video information.
  • the embodiment of the present disclosure can obtain input information during the live broadcast of the live broadcast user to determine the target object to be processed, and effectively highlight the target object by deforming the target object, thereby improving the display effect of the live broadcast interface and making the target object more prominent.
  • Users who watch the live broadcast can pay attention to the target object in time with the live broadcast process of the live broadcast user, which improves the interest and effect of the live broadcast, which in turn helps to improve the user retention rate in the live broadcast room.
  • FIG. 1 shows a schematic diagram of an implementation environment suitable for an embodiment of the present disclosure.
  • FIG. 3 shows a schematic flowchart of an image processing method provided by another embodiment of the present disclosure.
  • FIG. 4 shows a schematic diagram of a live broadcast interface provided by an exemplary embodiment of the present disclosure.
  • FIG. 5 shows a schematic flowchart of an image processing method provided by yet another embodiment of the present disclosure.
  • FIG. 6 shows a detailed flowchart of step S370 in FIG. 5 provided by an exemplary embodiment of the present disclosure.
  • FIG. 7 shows a schematic flowchart of determining a target object according to live broadcast content in an image processing method provided by an exemplary embodiment of the present disclosure.
  • FIG. 8 shows a block diagram of modules of an image processing apparatus provided by an embodiment of the present disclosure.
  • FIG. 9 shows a structural block diagram of an electronic device provided by an embodiment of the present disclosure.
  • the term “comprising” and variations thereof are open to include, i.e., “including but not limited to”.
  • the term “based on” is “based at least in part on.”
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one additional embodiment”; the term “some embodiments” means “at least some embodiments”. Relevant definitions of other terms will be given in the description below.
  • FIG. 1 shows a schematic diagram of an implementation environment applicable to an embodiment of the present disclosure, where the implementation environment includes: a first terminal 120 and a second terminal 140 . in:
  • the first terminal 120 and the second terminal 140 may be a mobile phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, a moving image compression standard audio layer 3), an MP4 (Moving Picture Experts Group Audio Layer IV, a moving image compression standard audio layer) Level 4) Players, wearable devices, in-vehicle devices, Augmented Reality (AR)/Virtual Reality (VR) devices, laptops, Ultra-Mobile Personal Computers (UMPC), netbooks , a personal digital assistant (Personal digital assistant, PDA) or a special camera (such as a single-lens reflex camera, a card camera) and the like.
  • the embodiment of the present disclosure does not limit the specific type of the terminal.
  • first terminal 120 and the second terminal 140 may be two terminals of the same type, or may be two terminals of different types, which are not limited in this embodiment of the present disclosure.
  • the first terminal 120 and the second terminal 140 respectively run a first client and a second client.
  • the first client and the second client may both be live broadcast applications (Application, APP), and the first client may represent the host client used by the host user, and the first terminal 120 may represent the host user.
  • the host terminal used; the second client terminal may represent the viewer client terminal used by the viewer user in the live room, and the second terminal 140 may represent the viewer terminal used by the viewer user.
  • the first terminal 120 and the second terminal 140 may be directly connected through a wired network or a wireless network.
  • the implementation environment may further include a server 200, then the first terminal 120 may also be connected to the second terminal 140 through the server 200, and the server 200 may be connected to the first terminal 120 and the second terminal 140 respectively through a wired network or a wireless network , so that data interaction can be performed between the server 200 and the first terminal 120 and the second terminal 140 .
  • the server 200 may be a traditional server, a cloud server, a single server, a server cluster composed of several servers, or a cloud computing service center.
  • FIG. 2 shows a schematic flowchart of an image processing method provided by an embodiment of the present disclosure, which can be applied to an electronic device, and the electronic device can be the above-mentioned first terminal or server. Taking the application to the first terminal (that is, the host terminal running the host client) as an example, the flow shown in FIG. 2 is described in detail below.
  • the image processing method may include the following steps:
  • S110 Acquire live audio and video information collected in real time.
  • a live broadcast request can be triggered based on the host client running on the host terminal.
  • the host client After the host client obtains the live broadcast request, it can start the image acquisition device and the audio acquisition device, and based on the image acquisition device and audio acquisition
  • the device collects live audio and video information, and if the image acquisition device shoots a live broadcast user, the collected live broadcast audio and video information may include a user image of the live broadcast user.
  • the display interface of the host client may display a control corresponding to the live broadcast portal, and by detecting a trigger operation acting on the control, the live broadcast request triggered by the live broadcast user may be obtained.
  • the host client is a live broadcast application that can be used for live broadcast.
  • the image collection device can be a device that can collect image information, such as a camera
  • the audio collection device can be a device that can collect audio information, such as a microphone.
  • the connected external device is not limited in this embodiment.
  • the host terminal can collect live audio and video information based on the image acquisition device and the audio acquisition device, so as to obtain live audio and video information collected in real time. If the method is applied to the server, the host terminal can transmit the live audio and video information collected in real time to the server, so that the server can obtain the live audio and video information collected in real time.
  • S120 Identify the input information of the live broadcast user in the live broadcast audio and video information, and determine the target object to be processed in the live broadcast audio and video information according to the input information.
  • the input information may include at least one of voice information, text information, touch information, and visual information. That is, the embodiment of the present disclosure does not limit the input form of the input information, which may be input by means of voice, touch operation, air gesture, or the like.
  • the target objects can be objects, or the whole or part of people, animals, plants, etc.
  • the visual information refers to the input information in the image information collected by the host terminal that can be used to determine the target object, for example, the image frame in which the live broadcast user performs a preset action, and the image frame can be the corresponding image frame in the live broadcast audio and video information. It can also be an image that only contains part of the image content in each video frame image.
  • the preset actions may include actions in a narrow sense, and may also include actions in a broad sense such as gestures, expressions, and gestures, which are not limited herein.
  • the visual information can also be a picture that can be used to indicate or characterize the target object. For example, if the target object is a lipstick, the visual information can also be a picture of the lipstick or its picture description information.
  • the specific implementation of determining the target object to be processed in the live audio and video information according to the input information may be: if it is detected based on the image information of the live audio and video information that the live user performs a If an action is set, the object indicated by the preset action is determined as the target object to be processed.
  • the video frame image corresponding to the preset action may be visual information.
  • the live broadcast user's host terminal can display the live broadcast interface, the live broadcast interface can display live broadcast audio and video information, and display other live broadcast content, such as object information superimposed on the live broadcast audio and video information.
  • object information such as object information superimposed on the live broadcast audio and video information.
  • the live broadcast user can click on the object information, so that after the anchor terminal detects the click event, the corresponding object information is obtained as the content corresponding to the input information or the recognition result. , to determine the corresponding target object.
  • the live broadcast user can input various forms of input information during the live broadcast, such as at least one of voice information, text information, touch information, and visual information.
  • the live broadcast user can input voice information by speaking, input by typing Text information, input visual information by performing preset actions, etc.
  • the content of the identified input information can be any one or more of the item name, style, model, item picture, etc., and the target object to be processed can be determined in the live audio and video information according to the input information.
  • the electronic device can detect the target object corresponding to the recognition result in the live audio and video information according to the recognition result of the input information.
  • the input information of the live user in the live audio and video information is identified, and the identification result can be the item name, then the feature vector description corresponding to the item name can be obtained, and described in the live audio and video information according to the feature vector.
  • the feature vector can be marked in the live audio and video information to describe the corresponding image area and the corresponding image can be used as the target image of the target object.
  • the input information of the live broadcast user may include things that the live broadcast user needs to introduce or describe to other users (such as audience users in the live broadcast room), that is, the target object in the embodiment of the present disclosure
  • the input information can be identified in the live broadcast audio and video information. Identify the target object for subsequent processing.
  • the corresponding target object when the corresponding target object is determined in the live audio and video information according to the feature vector description, it may be an incomplete match. For example, if the matching degree reaches a specified ratio, it can be considered a match, and it is determined that the feature vector exists in the live audio and video information. Describe the corresponding object, and mark the image area where the object is located, as described above.
  • the electronic device may be pre-built with a picture feature vector set, the picture feature vector set includes feature vector descriptions corresponding to various objects, and may be a complete set of a series of commodity data obtained through machine learning in the background.
  • the product pictures related to an object on the network can be integrated, and the feature vector description corresponding to the object can be obtained through machine learning and feature extraction, so as to quickly lock the object in the live audio and video information.
  • the feature vector description corresponding to an object may include at least one of a shape feature vector, a texture feature vector, and a color feature vector.
  • the picture feature vector set can be stored locally on the host terminal or on the server, and when it is stored on the server and the execution subject of this method is the host terminal, the server can find the corresponding feature vector description according to the input information, and then store it in the server.
  • the feature vector description is delivered to the host terminal, so that the host terminal can obtain the feature vector description to determine the corresponding object, that is, the target object, in the live audio and video information.
  • the identification of the input information can be performed locally on the electronic device or implemented through a network. For example, it can be sent to a server based on the network, and the server identifies the input information. This embodiment does not limit the identification method.
  • S130 Perform deformation processing on a target image corresponding to the target object in the live audio and video information to obtain a deformed target image.
  • the deformation processing may include at least one of enlargement processing, distortion processing, stretching processing, and fisheye special effect processing.
  • the specific implementation of the deformation processing is not limited in this embodiment, and can be determined according to actual needs.
  • the electronic device After determining the target object to be processed, the electronic device can perform deformation processing on the target image corresponding to the target object in the live audio and video information to obtain a deformed target image, so that the display effect of the target object in the live audio and video information changes. .
  • the transformation of the target object from before deformation to after deformation on the live broadcast interface brings a stronger sense of impact, so that users can more easily Pay attention to the target object
  • the target object can be highlighted during the live video broadcast, which can improve the user's attention to the target object in the live broadcast room, and because the target object is determined by the input information of the live broadcast user, so by highlighting the target object can make
  • the input information of the live broadcast user and the live broadcast content are more closely related, which is conducive to improving the efficiency and effect of the live broadcast.
  • the deformation processing is a fisheye effect processing.
  • the effect of the fisheye lens can be simulated, and the target image can be transformed into the image seen after adding the fisheye lens, which can not only increase the interest of the live broadcast, but also enrich the live broadcast effect.
  • step S130 may be: performing enlargement processing on the target image corresponding to the target object in the live audio and video information to obtain the enlarged target image as the transformed target image. Therefore, when the live broadcast user mentions the target object, the image size of the target image corresponding to the target object can be enlarged, so that the target object can be displayed more friendly, so that the audience user can observe and understand the target object more clearly through the live broadcast interface, which is beneficial to Improve live performance.
  • the image corresponding to the product can be enlarged through the embodiment of the present disclosure, so that the product features can be displayed more amicably, so that the audience user can By observing the product more clearly, you can better understand the product in combination with the live broadcast user's explanation, which can greatly improve the efficiency and effect of product recommendation.
  • the target image corresponding to the target object may only include the target object, or may include information other than the target object, which is not limited herein.
  • the electronic device can determine the image area where the target object is located in the live audio and video information, and intercept the image of the image area to obtain the intercepted image, perform front and background separation on the intercepted image, and extract the target as the foreground object.
  • the image of the object is used as the target image, so that matting can be realized, so that the target image to be deformed can only contain the target object, which is beneficial to obtain a more natural effect during subsequent synthesis.
  • the entire image selected in the image area where it is located is subjected to deformation processing.
  • the shape of the image area can be a circle, a rectangle, a fan shape, etc., which is not limited here. It is determined by the shape of the device, which is not limited here.
  • the live broadcast user can input the name of the target object "lipstick”, then the host terminal can obtain the feature vector description corresponding to the "lipstick”, and mark the item area where the "lipstick” is located in the live audio and video information based on the feature vector description, and then The image corresponding to the item area is taken as the target image corresponding to the target object, and the target image is deformed to obtain the deformed target image.
  • S140 Perform synthesis processing on the deformed target image and the image in the live audio and video information to obtain synthesized audio and video information, which is used for playing the synthesized audio and video information.
  • the deformed target image and the image in the currently collected live audio and video information can be synthesized to obtain synthesized audio and video information, which is used for playing the synthesized audio and video information. Then if this method is applied to the anchor terminal, when the anchor terminal obtains the synthesized audio and video information, it can play the synthesized audio and video information, and/or send the synthesized audio and video information to the terminal of the user in the live broadcast room, for example, through the server.
  • the server can send the synthesized audio and video information to the terminal of the user of the live room, including at least one of the host terminal and the audience terminal, so that at least one of the host terminal and the audience terminal Play composite audio and video information.
  • the image position of the target object can be determined in the currently collected live audio and video information, and the deformed target image can be superimposed on the image position, so that the deformed target image can be displayed corresponding to the image position, For example, the target object in the live audio and video information can be covered.
  • the synthesis processing is also performed corresponding to any other position, which is not limited herein.
  • the image processing method provided by this embodiment acquires the live audio and video information collected in real time, then identifies the input information of the live user in the live audio and video information, and determines the target object to be processed in the live audio and video information according to the input information, and then Perform deformation processing on the target image corresponding to the target object in the live audio and video information to obtain the deformed target image, and perform synthesis processing on the deformed target image and the image in the live audio and video information to obtain the synthesized audio and video information. Used to play synthetic audio and video information.
  • the embodiments of the present disclosure can obtain input information during the live video broadcast of the live user to determine the target object to be processed, and effectively highlight the target object by deforming the target object, thereby improving the display effect of the live broadcast interface.
  • This enables users who watch the live broadcast to pay attention to the target object in a timely manner along with the live broadcast process of the live broadcast user, which improves the interest and effect of the live broadcast, and further helps to improve the user retention rate in the live broadcast room.
  • the input information may include voice information
  • the electronic device may find the target object to be deformed in the live audio and video information according to the voice information input by the live broadcast user, thereby simplifying the operation of the live broadcast user without the need for the live broadcast user If you give special instructions, you can automatically lock the target object and deform it, which greatly improves the live broadcast efficiency and live broadcast effect.
  • FIG. 3 shows a schematic flowchart of an image processing method provided by another embodiment of the present disclosure.
  • the image processing method may include:
  • S210 Acquire live audio and video information collected in real time.
  • S220 Perform voice recognition on the voice information in the live audio and video information to obtain a voice recognition result.
  • the speech recognition model can be run on the host terminal or the server, which is not limited here. Based on the pre-trained speech recognition model, the electronic device can perform speech recognition on the speech information in the live audio and video information, and obtain a speech recognition result.
  • the object information to be processed may be information such as the name and identifier of the object that can describe the target object.
  • the identifier may also include a link corresponding to the object (click on the link to view at least one of the object-related information and the purchase portal), for example, if the text indicated by the speech recognition result is "lipstick", that is, the name of the object.
  • the speech recognition result can also be more specific information including style, model, etc. that can determine a unique object.
  • the information corresponding to the object to be processed is more specific, according to The feature vector description that can be found in the object information to be processed is more accurate, which is more conducive to accurately determining the target object in the live audio and video information.
  • S240 In the live audio and video information, determine the object corresponding to the object information to be processed as the target object.
  • the feature vector description corresponding to the object information to be processed can be obtained.
  • the electronic device can pre-build a mapping relationship between the object information and the feature vector description, and then according to the object information to be processed, the corresponding to determine the corresponding object in the live audio and video information according to the feature vector description, and determine the object as the target object.
  • the specific methods can be found in the corresponding parts of the foregoing embodiments, which will not be repeated here.
  • step S240 may include: if it is detected that the first object indicated by the object information to be processed exists in the live audio and video information, determining the first object as the target object.
  • the electronic device can detect whether there is a first object indicated by the object information to be processed in the live audio and video information, and if there is, it can change the first object to determine the target object .
  • the method of detecting whether it exists can be obtained by obtaining the feature vector description corresponding to the information of the object to be processed, and then matching the live audio and video information based on the feature vector description. There is a first object indicated by the object information to be processed in the audio and video information, and the first object is determined as the target object.
  • S250 Perform deformation processing on a target image corresponding to the target object in the live audio and video information to obtain a deformed target image.
  • S260 Perform synthesis processing on the deformed target image and the image in the live audio and video information to obtain synthesized audio and video information, which is used for playing the synthesized audio and video information.
  • the deformation processing may be enlargement processing
  • the specific implementation of step S250 may be: performing enlargement processing on the target image corresponding to the target object in the live audio and video information, and obtaining the enlarged target image as the deformed image. target image.
  • the enlargement process can be realized by opencv.
  • the image frames are composed according to the sequence of the time dimension.
  • the feature vector description corresponding to the object information to be processed that is A two-dimensional vector feature that anchors a specific image area on the image, which can be a rectangle (xstart, ystart, xend, yend), or other irregular images with multiple vertices, which are not limited here.
  • the code example is as follows:
  • V_EXPORTS_W void resize(InputArray src,OutputArray dst,
  • the size of the anchor area rectangle (xstart, ystart, xend, yend) is enlarged to n times of the original, and then synthesized, using The roi method overlays the binary data stream (item graphics) on the anchor position of the live audio and video information of the current frame (which can be the position where the original anchor area graphics is located). At this point, an enlargement process is realized.
  • FIG. 4 shows a schematic diagram of a live broadcast interface provided by an exemplary embodiment of the present disclosure.
  • the live broadcast user such as the anchor Zhang San
  • the live broadcast interface will display
  • the live audio and video information 410 obtains the information of the object to be processed by identifying the voice information of the anchor Zhang San.
  • the electronic device can obtain the feature vector description corresponding to the donut, and then in the live audio and video Find the image area 411 where the "doughnut” is marked with the feature vector description in the information 410, and then after the image of the "doughnut” is enlarged, the enlarged "doughnut” image is superimposed on the image area 411. If the position corresponding to the image area 411 covers the original image area 411, at time t+1, the live broadcast audio and video information 420 of the current frame is displayed on the live broadcast interface of anchor Zhang San, and the enlarged image area 421 is displayed on it. 'Donuts' image.
  • the electronic device can automatically identify the object introduced by the live broadcast user when the live broadcast user introduces the doughnut, and lock the object in the live broadcast audio and video information to zoom in, so that the live broadcast room user can listen to the live broadcast user introducing the doughnut. , while seeing the magnified donut, you can observe the donut more carefully, and get a better e-commerce live broadcast experience, so that users can fully understand the objects introduced by live broadcast users.
  • the electronic device may not be able to detect the object indicated by the object information to be processed in the live audio and video information, and thus cannot determine the target object in the live audio and video information. It is further determined, thereby reducing the missed detection rate and improving the system stability.
  • FIG. 5 shows a schematic flowchart of an image processing method provided by another embodiment of the present disclosure. The method may include:
  • S310 Acquire live audio and video information collected in real time.
  • S320 Perform voice recognition on the voice information in the live audio and video information to obtain a voice recognition result.
  • S340 Determine whether it is detected that the first object indicated by the object information to be processed exists in the live audio and video information.
  • the method may include:
  • step S350 If it is detected that the first object indicated by the object information to be processed exists in the live audio and video information, step S350 can be executed;
  • step S360 may be executed.
  • S350 Determine the first object as the target object.
  • the first object is determined as the target object.
  • S360 Perform image recognition processing on the live audio and video information.
  • the pre-stored feature vector description may be general.
  • the live broadcast user said "lipstick", and the lipstick in the live audio and video information does not look like a regular lipstick), that is, it does not match the feature vector description (the matching degree is lower than the specified ratio), and the live broadcast user only said the object
  • image recognition processing may be performed on the live audio and video information.
  • image recognition processing can be performed on the live audio and video information to identify whether there is a preset gesture, and if so, the The second object indicated by the preset gesture is further used as the target object.
  • the preset gesture can be one or more pre-stored gestures, which is not limited here, and can be set according to actual needs.
  • the preset gesture can be a circle with a finger, and the object that can be circled is the The second object indicated by the preset gesture; for another example, the preset gesture can also be four fingers together and only one finger is extended, then the object pointed to by the finger can be used as the second object indicated by the preset gesture .
  • the gesture of the live broadcast user can be used to further determine that the second object indicated by the preset gesture is the target object. Because the object that the user refers to and circles is usually what the user is describing, or even wants to highlight, this embodiment can more accurately determine the object in the live audio and video information, and can reduce the need to use only voice information to perform The missed detection rate that may be brought when it is determined.
  • the second object may not match the input information of the live broadcast user, so it may not be the target object that the live broadcast user currently wants to highlight.
  • the second object can be matched with the to-be-processed object information based on the to-be-processed object information indicated by the voice information, and only when the matching is successful, the second object is used as the target object.
  • FIG. 6, shows a detailed flowchart of step S370 in FIG. 5 provided by an exemplary embodiment of the present disclosure.
  • step S370 may include:
  • the image area indicated by the preset gesture can be determined, and the image of the image area can be intercepted to obtain the second image corresponding to the second object, and then the second image corresponding to the second image can be searched through the network.
  • the second object information if the second object information matches the to-be-processed object information, it can be determined that the second object matches the to-be-processed object information, and the second object can be used as the target object. For example, intercept the image of the image area that the live broadcast user points and circles with his hand as the second image corresponding to the second object. Determined as the target object to be deformed in a subsequent step.
  • S380 Perform deformation processing on a target image corresponding to the target object in the live audio and video information to obtain a deformed target image.
  • S390 Perform synthesis processing on the deformed target image and the image in the live audio and video information to obtain synthesized audio and video information, which is used for playing the synthesized audio and video information.
  • the target object can be further determined according to the live content displayed on the live broadcast interface, and the target object is displayed on the live broadcast interface.
  • the target object can be determined based on the target information.
  • FIG. 7 shows a schematic flowchart of determining a target object according to live content in an image processing method provided by an exemplary embodiment of the present disclosure, which may specifically include:
  • the target information may include at least one of an item identifier corresponding to the target object and an item image.
  • the live interface may display at least one kind of target information corresponding to the object information to be processed, such as an item identifier, an item image, a purchase portal, etc., and then the corresponding target can be found in the live audio and video information according to the target information.
  • the purchase portal may be displayed in the form of an item image, and the item image may have a built-in URL (Uniform Resource Locator, URL), and the user can click the item image to jump to the purchase page corresponding to the URL.
  • URL Uniform Resource Locator
  • a product image and product name can be displayed in the live broadcast interface. If the product name matches the information of the object to be processed, for example, the object information to be processed is the name of the object "lipstick", and the product name also includes "lipstick” , it can be considered a match, and the corresponding image area is marked in the live audio and video information based on the product image, and the object in the image area is the target object.
  • steps S410-S420 may be used to replace steps S360-S370 in FIG. 5 .
  • steps S410-S420 may be used to replace steps S360-S370 in FIG. 5 .
  • FIG. 8 is a block diagram of an image processing apparatus provided by an embodiment of the present disclosure.
  • the image processing apparatus 800 in the embodiment of the present disclosure may include: an information acquisition module 810 , a target determination module 820 , a target deformation module 830 , and an image synthesis module Module 840, where:
  • An information acquisition module 810 configured to acquire live audio and video information collected in real time
  • the target determination module 820 is used to identify the input information of the live user in the live audio and video information, and determine the target object to be processed in the live audio and video information according to the input information;
  • the target deformation module 830 is configured to perform deformation processing on the target image corresponding to the target object in the live audio and video information to obtain the deformed target image;
  • the image synthesis module 840 is used for synthesizing the deformed target image and the image in the live audio and video information to obtain the synthesized audio and video information, which is used for playing the synthesized audio and video information.
  • the input information includes voice information
  • the target determination module 820 may include: a voice recognition submodule, an object information determination submodule, and a target object determination submodule, wherein:
  • the speech recognition sub-module is used to perform speech recognition on the speech information in the live audio and video information to obtain the speech recognition result;
  • an object information determination submodule used for determining the object information to be processed based on the speech recognition result
  • the target object determination submodule is used for determining the object corresponding to the object information to be processed as the target object in the live audio and video information.
  • the target object determination sub-module may include: a first object determination unit, configured to determine the first object as the target if it is detected that the first object indicated by the object information to be processed exists in the live audio and video information. object.
  • the target object determination submodule may include: an image recognition unit and a gesture determination unit, wherein:
  • an image recognition unit configured to perform image recognition processing on the live audio and video information if it is not detected that the first object indicated by the to-be-processed object information exists in the live audio and video information;
  • the gesture determination unit is configured to use the second object indicated by the preset gesture as the target object if it is recognized that there is a preset gesture in the live audio and video information.
  • the gesture determination unit may include: a second object determination subunit and a target object determination subunit, wherein:
  • the second object determination subunit is configured to determine the object indicated by the preset gesture as the second object if it is recognized that there is a preset gesture in the live audio and video information;
  • the target object determination subunit is configured to use the second object as the target object if the second object matches the information of the object to be processed.
  • the target object determination submodule may include: a live content acquisition unit and a target information determination unit, wherein:
  • a live broadcast content acquisition unit configured to acquire the currently displayed live broadcast content if it is not detected that the first object indicated by the object information to be processed exists in the live broadcast audio and video information;
  • the target information determining unit is configured to determine the target object according to the target information if target information corresponding to the object information to be processed exists in the live broadcast content, and the target information includes at least one of an item identifier and an image corresponding to the target object.
  • the target deformation module 830 may include: an enlargement processing sub-module, configured to perform enlargement processing on the target image corresponding to the target object in the live audio and video information, and obtain the enlarged target image as the deformed target image.
  • the input information includes at least one of voice information, text information, touch information, and visual information.
  • the image processing apparatus in the embodiments of the present disclosure can execute an image processing method provided by the embodiments of the present disclosure, and the implementation principle is similar.
  • the actions performed by each module in the image processing apparatus in the embodiments of the present disclosure are the same as Corresponding to the steps in the image processing methods in the embodiments of the present disclosure, for the detailed functional description of each module of the image processing apparatus, please refer to the descriptions in the corresponding image processing methods shown above, which will not be repeated here. .
  • FIG. 9 shows a structural block diagram of an electronic device 900 suitable for implementing embodiments of the present disclosure.
  • the electronic device in the embodiment of the present disclosure may include, but is not limited to, a device such as a computer.
  • the electronic device shown in FIG. 9 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present disclosure.
  • the electronic device 900 includes: a memory and a processor, where the processor here may be referred to as a processing device 901 hereinafter, and the memory may include a read-only memory (ROM) 902, a random access memory (RAM) 903, and a storage device 908 hereinafter At least one of the following:
  • an electronic device 900 may include a processing device (eg, a central processing unit, a graphics processor, etc.) 901 that may be loaded into random access according to a program stored in a read only memory (ROM) 902 or from a storage device 908 Various appropriate actions and processes are executed by the programs in the memory (RAM) 903 . In the RAM 903, various programs and data necessary for the operation of the electronic device 900 are also stored.
  • the processing device 901, the ROM 902, and the RAM 903 are connected to each other through a bus 904.
  • An input/output (I/O) interface 905 is also connected to bus 904 .
  • the following devices can be connected to the I/O interface 905: input devices 906 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration An output device 907 such as a computer; a storage device 908 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 909 .
  • the communication means 909 may allow the electronic device 900 to communicate wirelessly or by wire with other devices to exchange data. While Figure 9 shows electronic device 900 having various means, it should be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer-readable storage medium, the computer program containing program code for performing the method illustrated in the flowchart.
  • the computer program may be downloaded and installed from the network via the communication device 909, or from the storage device 908, or from the ROM 902.
  • the processing apparatus 901 the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed.
  • the above-mentioned computer-readable storage medium of the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • the computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above.
  • Computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with computer-readable program code embodied thereon.
  • Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium can also be any computer-readable storage medium, other than a computer-readable storage medium, that can send, propagate, or transport a computer-readable signal medium for use by or in connection with the instruction execution system, apparatus, or device. program.
  • Program code embodied on a computer-readable storage medium may be transmitted using any suitable medium including, but not limited to, electrical wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.
  • clients and servers can communicate using any currently known or future developed network protocols such as HTTP (HyperText Transfer Protocol), and can communicate with digital data in any form or medium.
  • Communications eg, communications networks
  • Examples of communication networks include local area networks (“LAN”), wide area networks (“WAN”), the Internet (eg, the Internet), and peer-to-peer networks (eg, ad hoc peer-to-peer networks), as well as any currently known or future development network of.
  • LAN local area networks
  • WAN wide area networks
  • the Internet eg, the Internet
  • peer-to-peer networks eg, ad hoc peer-to-peer networks
  • the above-mentioned computer-readable storage medium may be included in the above-mentioned electronic device; or may exist alone without being assembled into the electronic device.
  • the above-mentioned computer-readable storage medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device is made to perform the following steps: acquiring live audio and video information collected in real time; identifying the live audio and video information The input information of the live broadcast user in the video information, and the target object to be processed is determined in the live broadcast audio and video information according to the input information; the target image corresponding to the target object in the live broadcast audio and video information is deformed to obtain The deformed target image; performing synthesis processing on the deformed target image and the image in the live audio and video information to obtain synthesized audio and video information, which is used for playing the synthesized audio and video information.
  • Computer program code for performing operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and This includes conventional procedural programming languages - such as the "C" language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).
  • LAN local area network
  • WAN wide area network
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions.
  • the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in dedicated hardware-based systems that perform the specified functions or operations. , or can be implemented in a combination of dedicated hardware and computer instructions.
  • modules or units involved in the embodiments of the present disclosure may be implemented in software or hardware.
  • the name of the module or unit does not constitute a limitation of the unit itself under certain circumstances, for example, the display module can also be described as "a module for displaying a resource uploading interface".
  • exemplary types of hardware logic components include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), Systems on Chips (SOCs), Complex Programmable Logical Devices (CPLDs) and more.
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs Systems on Chips
  • CPLDs Complex Programmable Logical Devices
  • a computer-readable storage medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • the computer-readable storage medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Computer-readable storage media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • an image processing method includes: acquiring live audio and video information collected in real time; identifying input information of a live user in the live audio and video information, and The input information determines the target object to be processed in the live audio and video information; performs deformation processing on the target image corresponding to the target object in the live audio and video information to obtain a deformed target image; The image is synthesized with the image in the live audio and video information to obtain synthesized audio and video information, which is used for playing the synthesized audio and video information.
  • the input information includes voice information, the identifying the input information of the live user in the live audio and video information, and determining the target object to be processed in the live audio and video information according to the input information, including: : perform voice recognition on the voice information in the live audio and video information to obtain a voice recognition result; determine the object information to be processed based on the voice recognition result; in the live audio and video information, map the object information to be processed corresponding to The object is determined as the target object.
  • determining the object corresponding to the object information to be processed as the target object in the live audio and video information includes: if it is detected that the to-be-processed audio and video information exists in the live audio and video information; The first object indicated by the object information is processed, and the first object is determined as the target object.
  • determining the object corresponding to the object information to be processed as the target object in the live audio and video information further comprising: if it is not detected that there is any object in the live audio and video information If the first object indicated by the object information to be processed is detected, image recognition processing is performed on the live audio and video information; if it is recognized that there is a preset gesture in the live audio and video information, the The second object serves as the target object.
  • the second object indicated by the preset gesture is used as the target object, including: if the live broadcast is recognized If there is a preset gesture in the audio and video information, the object indicated by the preset gesture is determined as the second object; if the second object matches the to-be-processed object information, the second object is determined as the second object. describe the target object.
  • determining the object corresponding to the object information to be processed as the target object in the live audio and video information further comprising: if it is not detected that there is any object in the live audio and video information If the first object indicated by the object information to be processed is obtained, the currently displayed live broadcast content is obtained; if there is target information corresponding to the object information to be processed in the live broadcast content, the target object is determined according to the target information, The target information includes at least one of an item identifier and an image corresponding to the target object.
  • performing deformation processing on the target image corresponding to the target object in the live audio and video information to obtain the deformed target image includes: performing deformation processing on the target object in the live audio and video information.
  • the corresponding target image in the image is enlarged, and the enlarged target image is obtained as the deformed target image.
  • the input information includes at least one of voice information, text information, touch information, and visual information.
  • an image processing apparatus includes: an information acquisition module for acquiring live audio and video information collected in real time; a target determination module for identifying the live broadcast The input information of the live broadcast user in the audio and video information, and the target object to be processed is determined in the live audio and video information according to the input information; the target deformation module is used for the corresponding target object in the live audio and video information.
  • the target image is deformed to obtain a deformed target image; an image synthesis module is used to synthesize the deformed target image and the image in the live audio and video information to obtain synthesized audio and video information for use in Play the synthesized audio and video information.
  • the input information includes voice information
  • the target determination module may include: a voice recognition sub-module, an object information determination sub-module, and a target object determination sub-module, wherein: a voice recognition sub-module is used for the live broadcast.
  • the voice information in the audio and video information is subjected to voice recognition, and a voice recognition result is obtained; an object information determination sub-module is used to determine the object information to be processed based on the voice recognition result; a target object determination sub-module is used for the live audio and video.
  • the object corresponding to the to-be-processed object information is determined as the target object.
  • the target object determination sub-module may include: a first object determination unit, configured to, if it is detected that the first object indicated by the to-be-processed object information exists in the live audio and video information, determine the The first object is determined as the target object.
  • the target object determination sub-module may include: an image recognition unit and a gesture determination unit, wherein: the image recognition unit is configured to, if it is not detected that the live audio and video information exists as indicated by the to-be-processed object information the first object of the live audio and video information, then perform image recognition processing on the live audio and video information; the gesture determination unit is configured to recognize that there is a preset gesture in the live audio and video information, then identify the first object indicated by the preset gesture Two objects are used as the target object.
  • the gesture determination unit may include: a second object determination subunit and a target object determination subunit, wherein: the second object determination subunit is used for if it is recognized that there is a preset gesture in the live audio and video information , the object indicated by the preset gesture is determined as the second object; the target object determination subunit is configured to, if the second object matches the to-be-processed object information, determine the second object as the object to be processed describe the target object.
  • the target object determination sub-module may include: a live broadcast content acquisition unit and a target information determination unit, wherein: the live broadcast content acquisition unit is used for, if it is not detected that the object to be processed exists in the live broadcast audio and video information The first object indicated by the information, obtain the currently displayed live content; a target information determination unit is configured to determine the target information according to the target information if there is target information corresponding to the object information to be processed in the live broadcast content A target object, the target information includes at least one of an item identifier and an image corresponding to the target object.
  • the target deformation module may include: an enlargement processing sub-module, configured to perform enlargement processing on the target image corresponding to the target object in the live audio and video information, and obtain the enlarged target image as the deformation. post target image.
  • the input information includes at least one of voice information, text information, touch information, and visual information.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

Provided are an image processing method and apparatus, an electronic device and a computer-readable storage medium, relating to the technical field of image processing. Said method comprises: acquiring live broadcast audio and video information collected in real time; identifying input information of a live broadcast user in the live broadcast audio and video information, and determining, according to the input information, a target object to be processed in the live broadcast audio and video information; performing deformation processing on a target image corresponding to the target object in the live broadcast audio and video information, so as to obtain a deformed target image; and performing synthesis processing on the deformed target image and an image in the live broadcast audio and video information, so as to obtain synthesized audio and video information for playing the synthesized audio and video information. In the embodiments of the present disclosure, when a live streamer mentions a target object, the target object may be highlighted by means of deformation processing, thereby improving the display and live broadcast effects.

Description

图像处理方法、装置、电子设备及计算机可读存储介质Image processing method, apparatus, electronic device, and computer-readable storage medium
相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS
本申请要求于2020年10月19日提交的,申请号为202011119916.0、发明名称为“图像处理方法、装置、电子设备及计算机可读存储介质”的中国专利申请的优先权,该申请的全文通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on October 19, 2020 with the application number 202011119916.0 and the invention titled "image processing method, device, electronic device and computer-readable storage medium", the full text of which is approved Reference is incorporated in this application.
技术领域technical field
本公开涉及图像处理技术领域,具体而言,本公开涉及一种图像处理方法、装置、电子设备及计算机可读存储介质。The present disclosure relates to the technical field of image processing, and in particular, to an image processing method, apparatus, electronic device, and computer-readable storage medium.
背景技术Background technique
随着移动互联网的发展以及移动终端的普及,各种各样的应用软件不断兴起,让用户使用移动终端时可体验更多不一样的功能。例如,目前的直播应用程序使得用户可实时看到其它用户的直播内容,还可以实时和主播进行互动。但是,目前直播应用程序的直播界面的显示方式较为单一,用户的观看体验不佳。With the development of the mobile Internet and the popularization of mobile terminals, various application software continue to emerge, allowing users to experience more different functions when using mobile terminals. For example, current live broadcast applications allow users to see other users' live broadcast content in real time, and also to interact with the host in real time. However, at present, the display mode of the live broadcast interface of the live broadcast application is relatively simple, and the user's viewing experience is not good.
发明内容SUMMARY OF THE INVENTION
提供该发明内容部分以便以简要的形式介绍构思,这些构思将在后面的具体实施方式部分被详细描述。该发明内容部分并不旨在标识要求保护的技术方案的关键特征或必要特征,也不旨在用于限制所要求的保护的技术方案的范围。This Summary is provided to introduce concepts in a simplified form that are described in detail in the Detailed Description section that follows. This summary section is not intended to identify key features or essential features of the claimed technical solution, nor is it intended to be used to limit the scope of the claimed technical solution.
第一方面,本公开实施例提供了一种图像处理方法,该方法包括:获取实时采集的直播音视频信息;识别所述直播音视频信息中直播用户的输入信息,并根据所述输入信息在直播音视频信息中确定待处理的目标对象;对所述目标对象在所述直播音视频信息中对应的目标图像进行变形处理,得到变形后的目标图像;对所述变形后的目标图像与所述直播音视频信息 中的图像进行合成处理,得到合成音视频信息,以用于播放所述合成音视频信息。In a first aspect, an embodiment of the present disclosure provides an image processing method, the method includes: acquiring live audio and video information collected in real time; Determine the target object to be processed in the live audio and video information; perform deformation processing on the target image corresponding to the target object in the live audio and video information to obtain a deformed target image; The images in the live audio and video information are synthesized and processed to obtain synthesized audio and video information, which is used for playing the synthesized audio and video information.
第二方面,本公开实施例提供了一种图像处理装置,该装置包括:信息获取模块,用于获取实时采集的直播音视频信息;目标确定模块,用于识别所述直播音视频信息中直播用户的输入信息,并根据所述输入信息在直播音视频信息中确定待处理的目标对象;目标变形模块,用于对所述目标对象在所述直播音视频信息中对应的目标图像进行变形处理,得到变形后的目标图像;图像合成模块,用于对所述变形后的目标图像与所述直播音视频信息中的图像进行合成处理,得到合成音视频信息,以用于播放所述合成音视频信息。In a second aspect, an embodiment of the present disclosure provides an image processing device, the device includes: an information acquisition module, configured to acquire live audio and video information collected in real time; a target determination module, configured to identify the live audio and video information in the live broadcast information. The input information of the user, and the target object to be processed is determined in the live audio and video information according to the input information; the target deformation module is used to deform the target image corresponding to the target object in the live audio and video information. , obtain the deformed target image; the image synthesis module is used for synthesizing the deformed target image and the image in the live audio and video information to obtain synthesized audio and video information, which is used for playing the synthesized audio video information.
第三方面,本公开实施例提供了一种电子设备,所述电子设备包括:一个或多个计算机程序,其中,所述一个或多个计算机程序被存储在所述存储器中并被配置为由所述一个或多个处理器执行,所述一个或多个计算机程序配置用于:执行如上述第一方面所述的方法。In a third aspect, embodiments of the present disclosure provide an electronic device, the electronic device comprising: one or more computer programs, wherein the one or more computer programs are stored in the memory and configured to be executed by The one or more processors execute, the one or more computer programs configured to: perform the method of the first aspect above.
第四方面,本公开实施例提供了一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器调用执行时实现如上述第一方面所述的方法。In a fourth aspect, an embodiment of the present disclosure provides a computer-readable storage medium on which a computer program is stored, and when the computer program is invoked and executed by a processor, the method described in the first aspect above is implemented.
本公开实施例提供的一种图像处理方法、装置、电子设备及计算机可读存储介质,通过获取实时采集的直播音视频信息,然后识别直播音视频信息中直播用户的输入信息,并根据输入信息在直播音视频信息中确定待处理的目标对象,接着对目标对象在直播音视频信息中对应的目标图像进行变形处理,得到变形后的目标图像,并对变形后的目标图像与直播音视频信息中的图像进行合成处理,得到合成音视频信息,以用于播放合成音视频信息。由此,本公开实施例可在直播用户的直播过程中获取输入信息来确定待处理的目标对象,并通过对目标对象作变形处理来有效突出目标对象,提升了直播界面的展示效果,并使得观看直播的用户可随着直播用户的直播过程及时关注到目标对象,提升了直播趣味性和直播效果,进而有利于提高直播间的用户留存率。An image processing method, device, electronic device, and computer-readable storage medium provided by the embodiments of the present disclosure, by acquiring live audio and video information collected in real time, and then identifying the input information of the live user in the live audio and video information, and according to the input information Determine the target object to be processed in the live audio and video information, and then perform deformation processing on the target image corresponding to the target object in the live audio and video information to obtain the deformed target image, and compare the deformed target image and the live audio and video information. Synthesize the images in the device to obtain synthesized audio and video information, which is used to play the synthesized audio and video information. Therefore, the embodiment of the present disclosure can obtain input information during the live broadcast of the live broadcast user to determine the target object to be processed, and effectively highlight the target object by deforming the target object, thereby improving the display effect of the live broadcast interface and making the target object more prominent. Users who watch the live broadcast can pay attention to the target object in time with the live broadcast process of the live broadcast user, which improves the interest and effect of the live broadcast, which in turn helps to improve the user retention rate in the live broadcast room.
附图说明Description of drawings
结合附图并参考以下具体实施方式,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent when taken in conjunction with the accompanying drawings and with reference to the following detailed description. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that the originals and elements are not necessarily drawn to scale.
图1示出了一种适用于本公开实施例的实施环境示意图。FIG. 1 shows a schematic diagram of an implementation environment suitable for an embodiment of the present disclosure.
图2示出了本公开一个实施例提供的图像处理方法的流程示意图。FIG. 2 shows a schematic flowchart of an image processing method provided by an embodiment of the present disclosure.
图3示出了本公开另一个实施例提供的图像处理方法的流程示意图。FIG. 3 shows a schematic flowchart of an image processing method provided by another embodiment of the present disclosure.
图4示出了本公开一个示例性实施例提供的直播界面示意图。FIG. 4 shows a schematic diagram of a live broadcast interface provided by an exemplary embodiment of the present disclosure.
图5示出了本公开又一个实施例提供的图像处理方法的流程示意图。FIG. 5 shows a schematic flowchart of an image processing method provided by yet another embodiment of the present disclosure.
图6示出了本公开一个示例性实施例提供的图5中步骤S370的详细流程示意图。FIG. 6 shows a detailed flowchart of step S370 in FIG. 5 provided by an exemplary embodiment of the present disclosure.
图7示出了本公开一个示例性实施例提供的图像处理方法中根据直播内容确定目标对象的流程示意图。FIG. 7 shows a schematic flowchart of determining a target object according to live broadcast content in an image processing method provided by an exemplary embodiment of the present disclosure.
图8示出了本公开实施例提供的图像处理装置的模块框图。FIG. 8 shows a block diagram of modules of an image processing apparatus provided by an embodiment of the present disclosure.
图9示出了本公开实施例提供的电子设备的结构框图。FIG. 9 shows a structural block diagram of an electronic device provided by an embodiment of the present disclosure.
具体实施方式Detailed ways
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for the purpose of A more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are only for exemplary purposes, and are not intended to limit the protection scope of the present disclosure.
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。It should be understood that the various steps described in the method embodiments of the present disclosure may be performed in different orders and/or in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this regard.
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限 于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。As used herein, the term "comprising" and variations thereof are open to include, i.e., "including but not limited to". The term "based on" is "based at least in part on." The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions of other terms will be given in the description below.
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对装置、模块或单元进行区分,并非用于限定这些装置、模块或单元一定为不同的装置、模块或单元,也并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。It should be noted that the concepts such as "first" and "second" mentioned in the present disclosure are only used to distinguish devices, modules or units, and are not used to limit these devices, modules or units to be different devices, modules or units. Units are not intended to limit the order or interdependence of the functions performed by these devices, modules or units.
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。It should be noted that the modifications of "a" and "a plurality" mentioned in the present disclosure are illustrative rather than restrictive, and those skilled in the art should understand that unless the context clearly indicates otherwise, they should be understood as "one or a plurality of". multiple".
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。The names of messages or information exchanged between multiple devices in the embodiments of the present disclosure are only for illustrative purposes, and are not intended to limit the scope of these messages or information.
下面以具体的实施例对本公开的技术方案以及本公开的技术方案如何解决上述技术问题进行详细说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例中不再赘述。下面将结合附图,对本公开的实施例进行描述。The technical solutions of the present disclosure and how the technical solutions of the present disclosure solve the above-mentioned technical problems will be described in detail below with specific embodiments. The following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments. The embodiments of the present disclosure will be described below with reference to the accompanying drawings.
请参阅图1,其示出了一种适用于本公开实施例所涉及的一种实施环境示意图,该实施环境包括:第一终端120和第二终端140。其中:Please refer to FIG. 1 , which shows a schematic diagram of an implementation environment applicable to an embodiment of the present disclosure, where the implementation environment includes: a first terminal 120 and a second terminal 140 . in:
第一终端120和第二终端140可以是手机、平板电脑、MP3播放器(Moving Picture Experts Group Audio LayerⅢ,动态影像压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio LayerⅣ,动态影像压缩标准音频层面4)播放器、可穿戴设备、车载设备、增强现实(Augmented Reality,AR)/虚拟现实(Virtual Reality,VR)设备、笔记本电脑、超级移动个人计算机(Ultra-Mobile Personal Computer,UMPC)、上网本、个人数字助理(Personal digital assistant,PDA)或专门的照相机(例如单反相机、卡片式相机)等。本公开实施例对终端的具体类型不作限定。The first terminal 120 and the second terminal 140 may be a mobile phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, a moving image compression standard audio layer 3), an MP4 (Moving Picture Experts Group Audio Layer IV, a moving image compression standard audio layer) Level 4) Players, wearable devices, in-vehicle devices, Augmented Reality (AR)/Virtual Reality (VR) devices, laptops, Ultra-Mobile Personal Computers (UMPC), netbooks , a personal digital assistant (Personal digital assistant, PDA) or a special camera (such as a single-lens reflex camera, a card camera) and the like. The embodiment of the present disclosure does not limit the specific type of the terminal.
另外,第一终端120和第二终端140可以是相同类型的两个终端,也 可以是不同类型的两个终端,本公开实施例对此不作限定。In addition, the first terminal 120 and the second terminal 140 may be two terminals of the same type, or may be two terminals of different types, which are not limited in this embodiment of the present disclosure.
第一终端120和第二终端140中分别运行有第一客户端和第二客户端。在一实施例中,第一客户端和第二客户端可以都是直播应用程序(Application,APP),并第一客户端可表示主播用户使用的主播客户端,第一终端120可表示主播用户使用的主播终端;第二客户端可表示直播间的观众用户所使用的观众客户端,第二终端140可表示观众用户所使用的观众终端。The first terminal 120 and the second terminal 140 respectively run a first client and a second client. In one embodiment, the first client and the second client may both be live broadcast applications (Application, APP), and the first client may represent the host client used by the host user, and the first terminal 120 may represent the host user. The host terminal used; the second client terminal may represent the viewer client terminal used by the viewer user in the live room, and the second terminal 140 may represent the viewer terminal used by the viewer user.
第一终端120和第二终端140之间可以直接通过有线网络或者无线网络相连。或者,该实施环境还可以包括服务器200,则第一终端120还可以通过服务器200与第二终端140相连,该服务器200可以通过有线网络或者无线网络分别与第一终端120和第二终端140相连,从而服务器200与第一终端120、第二终端140之间可进行数据交互。The first terminal 120 and the second terminal 140 may be directly connected through a wired network or a wireless network. Alternatively, the implementation environment may further include a server 200, then the first terminal 120 may also be connected to the second terminal 140 through the server 200, and the server 200 may be connected to the first terminal 120 and the second terminal 140 respectively through a wired network or a wireless network , so that data interaction can be performed between the server 200 and the first terminal 120 and the second terminal 140 .
其中,服务器200可以是传统服务器,也可以是云端服务器,可以是一台服务器,或者由若干台服务器组成的服务器集群,或者是一个云计算服务中心。The server 200 may be a traditional server, a cloud server, a single server, a server cluster composed of several servers, or a cloud computing service center.
下面将通过具体实施例对本公开实施例提供的图像处理方法、装置、电子设备及计算机可读存储介质进行详细说明。The image processing method, apparatus, electronic device, and computer-readable storage medium provided by the embodiments of the present disclosure will be described in detail below through specific embodiments.
请参阅图2,图2示出了本公开一个实施例提供的图像处理方法的流程示意图,可应用于电子设备,电子设备可为上述第一终端或服务器。下面以应用于第一终端(即运行有主播客户端的主播终端)为例,针对图2所示的流程进行详细的阐述,该图像处理方法可以包括以下步骤:Please refer to FIG. 2. FIG. 2 shows a schematic flowchart of an image processing method provided by an embodiment of the present disclosure, which can be applied to an electronic device, and the electronic device can be the above-mentioned first terminal or server. Taking the application to the first terminal (that is, the host terminal running the host client) as an example, the flow shown in FIG. 2 is described in detail below. The image processing method may include the following steps:
S110:获取实时采集的直播音视频信息。S110: Acquire live audio and video information collected in real time.
直播用户在有直播需求时,可基于主播终端上运行的主播客户端触发直播请求,主播客户端获取到该直播请求后,可启动图像采集装置和音频采集装置,并基于图像采集装置和音频采集装置采集直播音视频信息,并若图像采集装置对着直播用户进行拍摄,所采集的直播音视频信息中可包含直播用户的用户图像。在一示例中,主播客户端的显示界面可显示有对应直播入口的控件,通过检测作用于该控件的触发操作,即可获取直播用户触发的直播请求。When a live broadcast user has a live broadcast requirement, a live broadcast request can be triggered based on the host client running on the host terminal. After the host client obtains the live broadcast request, it can start the image acquisition device and the audio acquisition device, and based on the image acquisition device and audio acquisition The device collects live audio and video information, and if the image acquisition device shoots a live broadcast user, the collected live broadcast audio and video information may include a user image of the live broadcast user. In an example, the display interface of the host client may display a control corresponding to the live broadcast portal, and by detecting a trigger operation acting on the control, the live broadcast request triggered by the live broadcast user may be obtained.
其中,主播客户端为可用于直播的直播应用程序。图像采集装置可以是摄像头之类可采集图像信息的设备,音频采集装置可以是麦克风之类可采集音频信息的设备,图像采集装置和音频采集装置可以是集成于主播终端的,也可是与主播终端连接的外部设备,本实施例对此不作限定。The host client is a live broadcast application that can be used for live broadcast. The image collection device can be a device that can collect image information, such as a camera, and the audio collection device can be a device that can collect audio information, such as a microphone. The connected external device is not limited in this embodiment.
主播终端可基于图像采集装置和音频采集装置采集直播音视频信息,以获取实时采集的直播音视频信息。若本方法应用于服务器,则主播终端可将实时采集的直播音视频信息传输至服务器,使得服务器可获取实时采集的直播音视频信息。The host terminal can collect live audio and video information based on the image acquisition device and the audio acquisition device, so as to obtain live audio and video information collected in real time. If the method is applied to the server, the host terminal can transmit the live audio and video information collected in real time to the server, so that the server can obtain the live audio and video information collected in real time.
S120:识别直播音视频信息中直播用户的输入信息,并根据输入信息在直播音视频信息中确定待处理的目标对象。S120: Identify the input information of the live broadcast user in the live broadcast audio and video information, and determine the target object to be processed in the live broadcast audio and video information according to the input information.
在一些实施例中,输入信息可包括语音信息、文本信息、触控信息、视觉信息中的至少一个。即本公开实施例对输入信息的输入形式不作限定,可以是通过语音、触控操作、隔空手势等方式输入。In some embodiments, the input information may include at least one of voice information, text information, touch information, and visual information. That is, the embodiment of the present disclosure does not limit the input form of the input information, which may be input by means of voice, touch operation, air gesture, or the like.
根据输入信息的不同,对应的目标对象可以不同,目标对象可以是物品、也可以是人、动物、植物等的整体或局部,例如可以是整个人,也可以是人的身体部位、五官等。Depending on the input information, the corresponding target objects can be different. The target objects can be objects, or the whole or part of people, animals, plants, etc.
其中,需要说明的是,视觉信息是指主播终端采集的图像信息中可用于确定目标对象的输入信息,例如,直播用户执行预设动作的图像帧,该图像帧可以是直播音视频信息中相应的视频帧图像,也可以是各视频帧图像中仅包含部分图像内容的图像。其中,预设动作可包括狭义的动作,也可包括姿态、表情、手势等广义的动作,在此不作限定。再如,视觉信息也可以是可用于指示或表征目标对象的图片,例如若目标对象是口红,则视觉信息也可以是口红的图片或其图片描述信息。Among them, it should be noted that the visual information refers to the input information in the image information collected by the host terminal that can be used to determine the target object, for example, the image frame in which the live broadcast user performs a preset action, and the image frame can be the corresponding image frame in the live broadcast audio and video information. It can also be an image that only contains part of the image content in each video frame image. The preset actions may include actions in a narrow sense, and may also include actions in a broad sense such as gestures, expressions, and gestures, which are not limited herein. For another example, the visual information can also be a picture that can be used to indicate or characterize the target object. For example, if the target object is a lipstick, the visual information can also be a picture of the lipstick or its picture description information.
在一些实施例中,输入信息包括视觉信息时,根据输入信息在直播音视频信息中确定待处理的目标对象的具体实施方式可以为:若基于直播音视频信息的图像信息检测到直播用户执行预设动作,则将该预设动作所指示的对象确定为待处理的目标对象。此时,该预设动作对应的视频帧图像即可为视觉信息。In some embodiments, when the input information includes visual information, the specific implementation of determining the target object to be processed in the live audio and video information according to the input information may be: if it is detected based on the image information of the live audio and video information that the live user performs a If an action is set, the object indicated by the preset action is determined as the target object to be processed. In this case, the video frame image corresponding to the preset action may be visual information.
另外,若输入信息为触控信息,则直播用户的主播终端可显示直播界 面,直播界面可显示直播音视频信息,并显示其它的直播内容,例如在直播音视频信息上叠加显示的对象信息,例如,名称、型号、图片、链接等任意一种或多种,则直播用户可点击对象信息,使得主播终端检测到点击事件后,获取对应的对象信息,作为输入信息对应的内容或者说识别结果,以确定相应的目标对象。In addition, if the input information is touch information, the live broadcast user's host terminal can display the live broadcast interface, the live broadcast interface can display live broadcast audio and video information, and display other live broadcast content, such as object information superimposed on the live broadcast audio and video information. For example, any one or more of names, models, pictures, links, etc., the live broadcast user can click on the object information, so that after the anchor terminal detects the click event, the corresponding object information is obtained as the content corresponding to the input information or the recognition result. , to determine the corresponding target object.
在实际应用中,直播用户在直播过程中可以输入各种形式的输入信息,例如语音信息、文本信息、触控信息、视觉信息中至少一个,例如直播用户可通过说话输入语音信息、通过打字输入文本信息、通过执行预设动作输入视觉信息等。并在一些示例中,识别得到的输入信息的内容可以是物品名称、款式、型号、物品图片等任意一个或多个,则根据输入信息可以在直播音视频信息中确定待处理的目标对象。In practical applications, the live broadcast user can input various forms of input information during the live broadcast, such as at least one of voice information, text information, touch information, and visual information. For example, the live broadcast user can input voice information by speaking, input by typing Text information, input visual information by performing preset actions, etc. In some examples, the content of the identified input information can be any one or more of the item name, style, model, item picture, etc., and the target object to be processed can be determined in the live audio and video information according to the input information.
在一些实施方式中,电子设备根据对输入信息的识别结果,可根据该识别结果在直播音视频信息中检测该识别结果对应的目标对象。作为一种实施方式,识别直播音视频信息中直播用户的输入信息,得到识别结果可以为物品名称,则可获取该物品名称对应的特征向量描述,并根据该特征向量描述在直播音视频信息中确定对应的目标对象,例如,可以在直播音视频信息中标记该特征向量描述对应的图像区域并将对应的图像作为目标对象的目标图像。由于直播用户的输入信息可能包含直播用户需要向其他用户(如直播间的观众用户)介绍或描述的事物即本公开实施例中的目标对象,则可通过识别输入信息,在直播音视频信息中确定该目标对象,以作后续处理。In some embodiments, the electronic device can detect the target object corresponding to the recognition result in the live audio and video information according to the recognition result of the input information. As an embodiment, the input information of the live user in the live audio and video information is identified, and the identification result can be the item name, then the feature vector description corresponding to the item name can be obtained, and described in the live audio and video information according to the feature vector. To determine the corresponding target object, for example, the feature vector can be marked in the live audio and video information to describe the corresponding image area and the corresponding image can be used as the target image of the target object. Since the input information of the live broadcast user may include things that the live broadcast user needs to introduce or describe to other users (such as audience users in the live broadcast room), that is, the target object in the embodiment of the present disclosure, the input information can be identified in the live broadcast audio and video information. Identify the target object for subsequent processing.
需要说明的是,根据特征向量描述在直播音视频信息中确定对应的目标对象时,可以是不完全匹配,例如匹配程度达到指定比例即可认为匹配,而确定直播音视频信息中存在该特征向量描述对应的对象,在对该对象所在图像区域进行标记,如上所述。It should be noted that when the corresponding target object is determined in the live audio and video information according to the feature vector description, it may be an incomplete match. For example, if the matching degree reaches a specified ratio, it can be considered a match, and it is determined that the feature vector exists in the live audio and video information. Describe the corresponding object, and mark the image area where the object is located, as described above.
在一些实施方式中,电子设备可预先构建有图片特征向量集,图片特征向量集包括多种对象对应的特征向量描述,可以是后台通过机器学习获取到一系列的商品数据的全集。具体地,可将网络上与一个对象相关的商品图片整合起来,通过机器学习,特征提取,获取该对象对应的特征向量 描述,以用于在直播音视频信息中快速锁定该对象。一个对象对应的特征向量描述,可包括形状特征向量、纹理特征向量、颜色特征向量中至少一个。In some embodiments, the electronic device may be pre-built with a picture feature vector set, the picture feature vector set includes feature vector descriptions corresponding to various objects, and may be a complete set of a series of commodity data obtained through machine learning in the background. Specifically, the product pictures related to an object on the network can be integrated, and the feature vector description corresponding to the object can be obtained through machine learning and feature extraction, so as to quickly lock the object in the live audio and video information. The feature vector description corresponding to an object may include at least one of a shape feature vector, a texture feature vector, and a color feature vector.
其中,图片特征向量集可存储于主播终端本地,也可存储于服务器,而当存储于服务器且本方法的执行主体为主播终端时,则可由服务器根据输入信息找到对应的特征向量描述后,将特征向量描述下发到主播终端,使得主播终端可获取该特征向量描述,以在直播音视频信息中确定对应的对象,即目标对象。The picture feature vector set can be stored locally on the host terminal or on the server, and when it is stored on the server and the execution subject of this method is the host terminal, the server can find the corresponding feature vector description according to the input information, and then store it in the server. The feature vector description is delivered to the host terminal, so that the host terminal can obtain the feature vector description to determine the corresponding object, that is, the target object, in the live audio and video information.
另外,对输入信息的识别可以在电子设备本地执行,也可以通过网络实现,例如可基于网络发送至服务器,由服务器对输入信息进行识别,本实施例对识别方式不作限定。In addition, the identification of the input information can be performed locally on the electronic device or implemented through a network. For example, it can be sent to a server based on the network, and the server identifies the input information. This embodiment does not limit the identification method.
S130:对目标对象在直播音视频信息中对应的目标图像进行变形处理,得到变形后的目标图像。S130: Perform deformation processing on a target image corresponding to the target object in the live audio and video information to obtain a deformed target image.
其中,变形处理可包括放大处理、扭曲处理、拉伸处理、鱼眼特效处理中的至少一种处理,本实施例对变形处理的具体实施方式不作限定,可以根据实际需要确定。The deformation processing may include at least one of enlargement processing, distortion processing, stretching processing, and fisheye special effect processing. The specific implementation of the deformation processing is not limited in this embodiment, and can be determined according to actual needs.
电子设备在确定待处理的目标对象后,可对目标对象在直播音视频信息中对应的目标图像进行变形处理,得到变形后的目标图像,使得目标对象在直播音视频信息中的显示效果发生变化。那么,在相对其它变化幅度较小的其它信息和甚至几乎不变的背景而言,在直播界面上目标对象从变形前转换到变形后给人带来的冲击感更强,使得用户可更轻易地注意到目标对象,目标对象可在视频直播过程中得到突出,可提高直播间用户对目标对象的关注度,又由于目标对象时由直播用户的输入信息确定的,所以通过突出目标对象可使得直播用户的输入信息和直播内容的关联更紧密,有利于提高直播效率和效果。After determining the target object to be processed, the electronic device can perform deformation processing on the target image corresponding to the target object in the live audio and video information to obtain a deformed target image, so that the display effect of the target object in the live audio and video information changes. . Then, compared to other information with small changes and even almost unchanged backgrounds, the transformation of the target object from before deformation to after deformation on the live broadcast interface brings a stronger sense of impact, so that users can more easily Pay attention to the target object, the target object can be highlighted during the live video broadcast, which can improve the user's attention to the target object in the live broadcast room, and because the target object is determined by the input information of the live broadcast user, so by highlighting the target object can make The input information of the live broadcast user and the live broadcast content are more closely related, which is conducive to improving the efficiency and effect of the live broadcast.
当然通过不同的变形处理,还可实现相应的不同效果。例如,若变形处理为鱼眼特效处理。则可以模拟鱼眼镜头效果,将目标图像变形成加鱼眼镜头后看到的图像,由此不仅可增加直播趣味性,丰富直播效果。Of course, through different deformation processing, corresponding different effects can also be achieved. For example, if the deformation processing is a fisheye effect processing. Then the effect of the fisheye lens can be simulated, and the target image can be transformed into the image seen after adding the fisheye lens, which can not only increase the interest of the live broadcast, but also enrich the live broadcast effect.
再如,若变形处理为放大处理,则步骤S130的具体实施方式可为: 对目标对象在直播音视频信息中对应的目标图像进行放大处理,得到放大后的目标图像作为变形后的目标图像。由此,可在直播用户提到目标对象时,放大目标对象对应的目标图像的图像尺寸,从而更友好地展示目标对象,使得观众用户可通过直播界面更清晰地观察和了解目标对象,有利于提高直播效果。示例性的,在电商直播场景下,若直播用户需向观众用户介绍商品,则通过本公开实施例可对该商品对应的图像做放大处理,更友好地展示商品特征,使得观众用户可一边更清楚地观察该商品,一边可结合直播用户的讲解来更好地了解该商品,可大大提高商品推荐效率和效果。For another example, if the deformation processing is enlargement processing, the specific implementation of step S130 may be: performing enlargement processing on the target image corresponding to the target object in the live audio and video information to obtain the enlarged target image as the transformed target image. Therefore, when the live broadcast user mentions the target object, the image size of the target image corresponding to the target object can be enlarged, so that the target object can be displayed more friendly, so that the audience user can observe and understand the target object more clearly through the live broadcast interface, which is beneficial to Improve live performance. Exemplarily, in an e-commerce live broadcast scenario, if the live broadcast user needs to introduce a product to the audience user, the image corresponding to the product can be enlarged through the embodiment of the present disclosure, so that the product features can be displayed more amicably, so that the audience user can By observing the product more clearly, you can better understand the product in combination with the live broadcast user's explanation, which can greatly improve the efficiency and effect of product recommendation.
其中,目标对象对应的目标图像可以仅包含目标对象,也可以包含目标对象以外的信息,在此不做限定。作为一种实施方式,电子设备可在直播音视频信息中确定目标对象所在的图像区域,并截取该图像区域的图像得到截取图像,对该截取图像进行前、背景分离,提取作为前景对象的目标对象的图像作为目标图像,由此可实现抠图,使得待变形处理的目标图像可仅包含目标对象,有利于在后续合成时取得更贴合自然的效果。The target image corresponding to the target object may only include the target object, or may include information other than the target object, which is not limited herein. As an embodiment, the electronic device can determine the image area where the target object is located in the live audio and video information, and intercept the image of the image area to obtain the intercepted image, perform front and background separation on the intercepted image, and extract the target as the foreground object. The image of the object is used as the target image, so that matting can be realized, so that the target image to be deformed can only contain the target object, which is beneficial to obtain a more natural effect during subsequent synthesis.
所在的图像区域所选中的整个图像做变形处理,其中,图像区域的形状可以是圆形、矩形、扇形等,在此不作限定,可以是预先设置,也可以由直播用户基于其主播终端划出的形状来确定,在此不作限定。The entire image selected in the image area where it is located is subjected to deformation processing. The shape of the image area can be a circle, a rectangle, a fan shape, etc., which is not limited here. It is determined by the shape of the device, which is not limited here.
例如,直播用户可输入目标对象的名称“口红”,则主播终端可获取“口红”对应的特征向量描述,并基于该特征向量描述在直播音视频信息中标记“口红”所在的物品区域,然后该物品区域对应的图像作为目标对象对应的目标图像,对目标图像进行变形处理,得到变形后的目标图像。For example, the live broadcast user can input the name of the target object "lipstick", then the host terminal can obtain the feature vector description corresponding to the "lipstick", and mark the item area where the "lipstick" is located in the live audio and video information based on the feature vector description, and then The image corresponding to the item area is taken as the target image corresponding to the target object, and the target image is deformed to obtain the deformed target image.
S140:对变形后的目标图像与直播音视频信息中的图像进行合成处理,得到合成音视频信息,以用于播放合成音视频信息。S140: Perform synthesis processing on the deformed target image and the image in the live audio and video information to obtain synthesized audio and video information, which is used for playing the synthesized audio and video information.
得到变形后的目标图像后,可将变形后的目标图像与当前采集的直播音视频信息中的图像进行合成处理,得到合成音视频信息,以用于播放合成音视频信息。则若本方法应用于主播终端,则主播终端得到合成音视频信息时,可播放合成音视频信息,和/或将合成音视频信息发送至直播间用户的终端,例如通过服务器将合成音视频信息发送至直播间的观众终端;若本方法应用于服务器,则服务器可将合成音视频信息发送至直播间用户 的终端,包括主播终端和观众终端的至少一个,使得主播终端和观众终端的至少一个播放合成音视频信息。After the deformed target image is obtained, the deformed target image and the image in the currently collected live audio and video information can be synthesized to obtain synthesized audio and video information, which is used for playing the synthesized audio and video information. Then if this method is applied to the anchor terminal, when the anchor terminal obtains the synthesized audio and video information, it can play the synthesized audio and video information, and/or send the synthesized audio and video information to the terminal of the user in the live broadcast room, for example, through the server. Send to the audience terminal of the live room; if the method is applied to the server, the server can send the synthesized audio and video information to the terminal of the user of the live room, including at least one of the host terminal and the audience terminal, so that at least one of the host terminal and the audience terminal Play composite audio and video information.
在一些实施方式中,可在当前采集的直播音视频信息中确定目标对象的图像位置,并将变形后的目标图像叠加至该图像位置,使得变形后的目标图像可对应该图像位置进行显示,例如可覆盖直播音视频信息中的目标对象。在另一些实施方式中,也对应任意其它位置进行合成处理,在此不作限定。In some embodiments, the image position of the target object can be determined in the currently collected live audio and video information, and the deformed target image can be superimposed on the image position, so that the deformed target image can be displayed corresponding to the image position, For example, the target object in the live audio and video information can be covered. In other embodiments, the synthesis processing is also performed corresponding to any other position, which is not limited herein.
本实施例提供的图像处理方法,通过获取实时采集的直播音视频信息,然后识别直播音视频信息中直播用户的输入信息,并根据输入信息在直播音视频信息中确定待处理的目标对象,接着对目标对象在直播音视频信息中对应的目标图像进行变形处理,得到变形后的目标图像,并对变形后的目标图像与直播音视频信息中的图像进行合成处理,得到合成音视频信息,以用于播放合成音视频信息。由此,本公开实施例可在直播用户的视频直播过程中获取输入信息来确定待处理的目标对象,并通过对目标对象作变形处理来有效突出目标对象,提升了直播界面的展示效果,并使得观看直播的用户可随着直播用户的直播过程及时关注到目标对象,提升了直播趣味性和直播效果,进而有利于提高直播间的用户留存率。The image processing method provided by this embodiment acquires the live audio and video information collected in real time, then identifies the input information of the live user in the live audio and video information, and determines the target object to be processed in the live audio and video information according to the input information, and then Perform deformation processing on the target image corresponding to the target object in the live audio and video information to obtain the deformed target image, and perform synthesis processing on the deformed target image and the image in the live audio and video information to obtain the synthesized audio and video information. Used to play synthetic audio and video information. As a result, the embodiments of the present disclosure can obtain input information during the live video broadcast of the live user to determine the target object to be processed, and effectively highlight the target object by deforming the target object, thereby improving the display effect of the live broadcast interface. This enables users who watch the live broadcast to pay attention to the target object in a timely manner along with the live broadcast process of the live broadcast user, which improves the interest and effect of the live broadcast, and further helps to improve the user retention rate in the live broadcast room.
在一些实施例中,输入信息可包括语音信息,则电子设备可根据直播用户输入的语音信息在直播音视频信息中找到待变形处理的目标对象,由此可简化直播用户的操作,无需直播用户特意做出指示,即可自动锁定目标对象并对其做变形处理,大大提高直播效率和直播效果。具体地,请参阅图3,图3示出了本公开另一个实施例提供的图像处理方法的流程示意图,该图像处理方法可以包括:In some embodiments, the input information may include voice information, and the electronic device may find the target object to be deformed in the live audio and video information according to the voice information input by the live broadcast user, thereby simplifying the operation of the live broadcast user without the need for the live broadcast user If you give special instructions, you can automatically lock the target object and deform it, which greatly improves the live broadcast efficiency and live broadcast effect. Specifically, please refer to FIG. 3, which shows a schematic flowchart of an image processing method provided by another embodiment of the present disclosure. The image processing method may include:
S210:获取实时采集的直播音视频信息。S210: Acquire live audio and video information collected in real time.
S220:对直播音视频信息中的语音信息进行语音识别,得到语音识别结果。S220: Perform voice recognition on the voice information in the live audio and video information to obtain a voice recognition result.
在一些实施方式中,语音识别模型可运行在主播终端,也可运行在服务器,在此不作限定。基于预先训练好的语音识别模型,电子设备可对直播音视频信息中的语音信息进行语音识别,得到语音识别结果。In some implementations, the speech recognition model can be run on the host terminal or the server, which is not limited here. Based on the pre-trained speech recognition model, the electronic device can perform speech recognition on the speech information in the live audio and video information, and obtain a speech recognition result.
S230:基于语音识别结果确定待处理对象信息。S230: Determine the object information to be processed based on the speech recognition result.
其中,待处理对象信息可以是对象的名称、标识等一类可描述目标对象的信息。其中,标识又可包括对象对应的链接(点击连接可查看对象相关信息、购买入口中至少一个),例如,若语音识别结果指示的文本为“口红”,即对象的名称。当然,语音识别结果还可以是更具体的包含款式、型号等可确定一个唯一对象的信息,例如包含对象种类、品牌、色号的“阿玛尼口红301”,则对应待处理对象信息更具体,根据待处理对象信息可查找到的特征向量描述也就越准确,更有利于在直播音视频信息中准确确定目标对象。The object information to be processed may be information such as the name and identifier of the object that can describe the target object. Wherein, the identifier may also include a link corresponding to the object (click on the link to view at least one of the object-related information and the purchase portal), for example, if the text indicated by the speech recognition result is "lipstick", that is, the name of the object. Of course, the speech recognition result can also be more specific information including style, model, etc. that can determine a unique object. For example, "Armani Lipstick 301" including the object type, brand, and color number, the information corresponding to the object to be processed is more specific, according to The feature vector description that can be found in the object information to be processed is more accurate, which is more conducive to accurately determining the target object in the live audio and video information.
S240:在直播音视频信息中,将待处理对象信息对应的对象确定为目标对象。S240: In the live audio and video information, determine the object corresponding to the object information to be processed as the target object.
在一些实施方式中,可获取该待处理对象信息对应的特征向量描述,例如电子设备可预先构建有对象信息与特征向量描述之间的映射关系,则根据一个待处理对象信息,可查找到对应的特征向量描述,以根据该特征向量描述在直播音视频信息中确定对应的对象,并将该对象确定为目标对象。具体方式可见前述实施例的相应部分,在此不再赘述。In some embodiments, the feature vector description corresponding to the object information to be processed can be obtained. For example, the electronic device can pre-build a mapping relationship between the object information and the feature vector description, and then according to the object information to be processed, the corresponding to determine the corresponding object in the live audio and video information according to the feature vector description, and determine the object as the target object. The specific methods can be found in the corresponding parts of the foregoing embodiments, which will not be repeated here.
在一些实施例中,步骤S240可包括:若检测到直播音视频信息中存在待处理对象信息所指示的第一对象,则将第一对象确定为目标对象。作为一种方式,电子设备在确定待处理对象信息后,可在直播音视频信息中检测是否存在待处理对象信息所指示的第一对象,若存在,则可件更改第一对象确定为目标对象。检测是否存在的方式可以使通过获取待处理对象信息对应的特征向量描述,然后可基于特征向量描述在直播音视频信息进行匹配,若存在匹配程度高于指定比例的图像区域,可判定检测到直播音视频信息中存在待处理对象信息所指示的第一对象,将第一对象确定为目标对象。In some embodiments, step S240 may include: if it is detected that the first object indicated by the object information to be processed exists in the live audio and video information, determining the first object as the target object. As a method, after determining the object information to be processed, the electronic device can detect whether there is a first object indicated by the object information to be processed in the live audio and video information, and if there is, it can change the first object to determine the target object . The method of detecting whether it exists can be obtained by obtaining the feature vector description corresponding to the information of the object to be processed, and then matching the live audio and video information based on the feature vector description. There is a first object indicated by the object information to be processed in the audio and video information, and the first object is determined as the target object.
S250:对目标对象在直播音视频信息中对应的目标图像进行变形处理,得到变形后的目标图像。S250: Perform deformation processing on a target image corresponding to the target object in the live audio and video information to obtain a deformed target image.
S260:对变形后的目标图像与直播音视频信息中的图像进行合成处理,得到合成音视频信息,以用于播放合成音视频信息。S260: Perform synthesis processing on the deformed target image and the image in the live audio and video information to obtain synthesized audio and video information, which is used for playing the synthesized audio and video information.
在一些实施方式中,变形处理可以为放大处理,则步骤S250的具体实施方式可为:对目标对象在直播音视频信息中对应的目标图像进行放大处理,得到放大后的目标图像作为变形后的目标图像。In some embodiments, the deformation processing may be enlargement processing, and the specific implementation of step S250 may be: performing enlargement processing on the target image corresponding to the target object in the live audio and video information, and obtaining the enlarged target image as the deformed image. target image.
在一个示例性的实施方式中,可通过opencv实现放大处理,视频直播过程中,图像帧按照时间维度的序列组成,通过分析每一帧的二进制文件,结合待处理对象信息对应的特征向量描述即一种二维向量特征,在图像上锚定具体的图像区域,可以是矩形(xstart,ystart,xend,yend),或其他含有多个顶点的不规则图像,在此不做限定。以锚定区域图形为矩形为例,然后可使用原型函数,将该锚定区域图形做放大处理后输出二进制数据流,代码示例如下:In an exemplary embodiment, the enlargement process can be realized by opencv. During the live video broadcast, the image frames are composed according to the sequence of the time dimension. By analyzing the binary file of each frame, combined with the feature vector description corresponding to the object information to be processed, that is A two-dimensional vector feature that anchors a specific image area on the image, which can be a rectangle (xstart, ystart, xend, yend), or other irregular images with multiple vertices, which are not limited here. Take the anchor area graphic as a rectangle as an example, and then use the prototype function to enlarge the anchor area graphic and output the binary data stream. The code example is as follows:
V_EXPORTS_W void resize(InputArray src,OutputArray dst,V_EXPORTS_W void resize(InputArray src,OutputArray dst,
Size dsize,double fx=0,double fy=0,Size dsize, double fx=0, double fy=0,
int interpolation=INTER_LINEAR);int interpolation=INTER_LINEAR);
然后将获取的二进制数据流,叠加到当前帧的直播音视频信息的指定位置,此时锚定区域矩形(xstart,ystart,xend,yend)的大小扩大为原来的n倍,然后进行合成,利用roi的方式将二进制数据流(物品图形)覆盖到当前帧直播音视频信息的锚定位置(可以是原锚定区域图形所在的位置)上。此时便实现了一种放大处理。Then superimpose the obtained binary data stream to the specified position of the live audio and video information of the current frame. At this time, the size of the anchor area rectangle (xstart, ystart, xend, yend) is enlarged to n times of the original, and then synthesized, using The roi method overlays the binary data stream (item graphics) on the anchor position of the live audio and video information of the current frame (which can be the position where the original anchor area graphics is located). At this point, an enlargement process is realized.
当然,上述仅为一种示例,本公开实施例不仅限于上述一种实施方式。Of course, the above is only an example, and the embodiment of the present disclosure is not limited to the above-mentioned one implementation manner.
在一个示例性的场景中,请参阅图4,其示出了本公开一个示例性实施例提供的直播界面示意图,若直播用户如主播张三在介绍的物品为甜甜圈,直播界面上显示直播音视频信息410,通过识别主播张三的语音信息获取待处理对象信息是物品名称为“甜甜圈”,则时刻t电子设备可获取甜甜圈对应的特征向量描述,然后在直播音视频信息410中查找与该特征向量描述标记出“甜甜圈”所处的图像区域411,然后对该“甜甜圈”的图像进行放大处理后,将放大后的“甜甜圈”图像叠加在图像区域411对应的位置上,覆盖原图像区域411,则在时刻t+1,主播张三的直播界面上显示当前帧的直播音视频信息420,并在其上的图像区域421显示有放大后的“甜甜圈”图像。由此,电子设备可在直播用户介绍甜甜圈时,自 动识别直播用户所介绍的对象,并在直播音视频信息中锁定该对象进行放大,使得直播间用户可以一边听直播用户介绍甜甜圈,一边看到被放大后的甜甜圈,可更仔细观察甜甜圈,得到更加的电商直播体验,使得用户可更充分了解直播用户所介绍的对象。In an exemplary scenario, please refer to FIG. 4 , which shows a schematic diagram of a live broadcast interface provided by an exemplary embodiment of the present disclosure. If the live broadcast user, such as the anchor Zhang San, introduces a doughnut, the live broadcast interface will display The live audio and video information 410 obtains the information of the object to be processed by identifying the voice information of the anchor Zhang San. If the item name is "doughnut", then at time t, the electronic device can obtain the feature vector description corresponding to the donut, and then in the live audio and video Find the image area 411 where the "doughnut" is marked with the feature vector description in the information 410, and then after the image of the "doughnut" is enlarged, the enlarged "doughnut" image is superimposed on the image area 411. If the position corresponding to the image area 411 covers the original image area 411, at time t+1, the live broadcast audio and video information 420 of the current frame is displayed on the live broadcast interface of anchor Zhang San, and the enlarged image area 421 is displayed on it. 'Donuts' image. As a result, the electronic device can automatically identify the object introduced by the live broadcast user when the live broadcast user introduces the doughnut, and lock the object in the live broadcast audio and video information to zoom in, so that the live broadcast room user can listen to the live broadcast user introducing the doughnut. , while seeing the magnified donut, you can observe the donut more carefully, and get a better e-commerce live broadcast experience, so that users can fully understand the objects introduced by live broadcast users.
需要说明的是,本实施例中未详细描述的部分请参考前述实施例,在此不再赘述。It should be noted that, for the parts that are not described in detail in this embodiment, please refer to the foregoing embodiment, and details are not repeated here.
另外,在一些实施例中,电子设备可能无法在直播音视频信息中检测到待处理对象信息所指示的对象,也就无法在直播音视频信息中确定目标对象,此时,可以结合其他信息来进一步确定,从而降低漏检率,提高系统稳定性。具体地,请参阅图5,其示出了本公开又一个实施例提供的图像处理方法的流程示意图,该方法可以包括:In addition, in some embodiments, the electronic device may not be able to detect the object indicated by the object information to be processed in the live audio and video information, and thus cannot determine the target object in the live audio and video information. It is further determined, thereby reducing the missed detection rate and improving the system stability. Specifically, please refer to FIG. 5, which shows a schematic flowchart of an image processing method provided by another embodiment of the present disclosure. The method may include:
S310:获取实时采集的直播音视频信息。S310: Acquire live audio and video information collected in real time.
S320:对直播音视频信息中的语音信息进行语音识别,得到语音识别结果。S320: Perform voice recognition on the voice information in the live audio and video information to obtain a voice recognition result.
S330:基于语音识别结果确定待处理对象信息。S330: Determine the object information to be processed based on the speech recognition result.
S340:判断是否检测到直播音视频信息中存在待处理对象信息所指示的第一对象。S340: Determine whether it is detected that the first object indicated by the object information to be processed exists in the live audio and video information.
于本实施例中,判断是否检测到直播音视频信息中存在待处理对象信息所指示的第一对象之后,可包括:In this embodiment, after judging whether the first object indicated by the object information to be processed exists in the live audio and video information is detected, the method may include:
若检测到直播音视频信息中存在待处理对象信息所指示的第一对象,可执行步骤S350;If it is detected that the first object indicated by the object information to be processed exists in the live audio and video information, step S350 can be executed;
若未检测到直播音视频信息中存在待处理对象信息所指示的第一对象,可执行步骤S360。If it is not detected that the first object indicated by the object information to be processed exists in the live audio and video information, step S360 may be executed.
S350:将第一对象确定为目标对象。S350: Determine the first object as the target object.
若检测到直播音视频信息中存在待处理对象信息所指示的第一对象,将第一对象确定为目标对象。If it is detected that the first object indicated by the object information to be processed exists in the live audio and video information, the first object is determined as the target object.
S360:对直播音视频信息进行图像识别处理。S360: Perform image recognition processing on the live audio and video information.
由于在一些实施例中,为了降低特征向量描述的存储数据量,预先存储的特征向量描述可以是通用的,比如一类物品对应存储一个特征向量描 述,则如果一个对象的外形不常规(如,直播用户说了“口红”,而直播音视频信息中的口红长得不像常规口红的模样),即与特征向量描述不匹配(匹配程度低于指定比例),而直播用户又只说了对象的名称、没有其它可用于从网络上搜索对应的图片用来匹配的信息如对象具体的品牌型号等时,则可能无法根据该特征向量描述在直播音视频信息中确定对应的目标对象,从而可能导致漏检。此时,可结合其他信息来进一步确定目标对象,从而降低漏检率。则若未检测到直播音视频信息中存在待处理对象信息所指示的第一对象,可以对直播音视频信息进行图像识别处理。Because in some embodiments, in order to reduce the amount of stored data for the feature vector description, the pre-stored feature vector description may be general. The live broadcast user said "lipstick", and the lipstick in the live audio and video information does not look like a regular lipstick), that is, it does not match the feature vector description (the matching degree is lower than the specified ratio), and the live broadcast user only said the object If there is no other information that can be used to search the corresponding picture from the Internet for matching, such as the specific brand model of the object, etc., it may not be possible to determine the corresponding target object in the live audio and video information according to the feature vector description, which may lead to missed inspections. At this time, other information can be combined to further determine the target object, thereby reducing the missed detection rate. Then, if it is not detected that the first object indicated by the object information to be processed exists in the live audio and video information, image recognition processing may be performed on the live audio and video information.
S370:若识别到直播音视频信息中存在预设手势,则将预设手势所指示的第二对象作为目标对象。S370: If it is recognized that there is a preset gesture in the live audio and video information, the second object indicated by the preset gesture is used as the target object.
在一些实施方式中,若未检测到直播音视频信息中存在待处理对象信息所指示的第一对象,可对直播音视频信息进行图像识别处理,识别其中是否存在预设手势,若存在则可进一步将预设手势所指示的第二对象作为目标对象。其中,预设手势可以是预先存储的一个或多个手势,在此不做限定,可以根据实际需要设定,例如,预设手势可以是手指画圈,则可将圈住的对象即为该预设手势所指示的第二对象;再如,预设手势也可以是并拢四指,仅伸出一根手指头,则可将该手指头指向的对象作为预设手势所指示的第二对象。由此,则可在未检测到直播音视频信息中存在待处理对象信息所指示的第一对象时,可利用直播用户的手势来进一步确定预设手势所指示的第二对象为目标对象。因为通常用户所指、画圈圈住的对象是用户正在描述、甚至想突出强调的,所以通过本实施例可较准确地确定直播音视频信息中的对象,并可降低因仅借助语音信息进行确定时可能带来的漏检率。In some embodiments, if it is not detected that the first object indicated by the object information to be processed exists in the live audio and video information, image recognition processing can be performed on the live audio and video information to identify whether there is a preset gesture, and if so, the The second object indicated by the preset gesture is further used as the target object. The preset gesture can be one or more pre-stored gestures, which is not limited here, and can be set according to actual needs. For example, the preset gesture can be a circle with a finger, and the object that can be circled is the The second object indicated by the preset gesture; for another example, the preset gesture can also be four fingers together and only one finger is extended, then the object pointed to by the finger can be used as the second object indicated by the preset gesture . Therefore, when it is not detected that the first object indicated by the object information to be processed exists in the live audio and video information, the gesture of the live broadcast user can be used to further determine that the second object indicated by the preset gesture is the target object. Because the object that the user refers to and circles is usually what the user is describing, or even wants to highlight, this embodiment can more accurately determine the object in the live audio and video information, and can reduce the need to use only voice information to perform The missed detection rate that may be brought when it is determined.
另外,在一些实施例中,第二对象可能与直播用户的输入信息不匹配,则可能不是直播用户当前想突出的目标对象,则为了进一步提升重新确定目标对象的准确性,确定预设手势所指示的第二对象后,可基于语音信息所指示的待处理对象信息,将第二对象与待处理对象信息进行匹配,当匹配成功时,才将第二对象作为目标对象。具体地,请参阅图6,其示出了本公开一个示例性实施例提供的图5中步骤S370的详细流程示意图,于 本实施例中,步骤S370可包括:In addition, in some embodiments, the second object may not match the input information of the live broadcast user, so it may not be the target object that the live broadcast user currently wants to highlight. In order to further improve the accuracy of re-determining the target object, After the indicated second object, the second object can be matched with the to-be-processed object information based on the to-be-processed object information indicated by the voice information, and only when the matching is successful, the second object is used as the target object. Specifically, please refer to FIG. 6, which shows a detailed flowchart of step S370 in FIG. 5 provided by an exemplary embodiment of the present disclosure. In this embodiment, step S370 may include:
S371:若识别到直播音视频信息中存在预设手势,则将预设手势所指示的对象确定为第二对象。S371: If it is recognized that there is a preset gesture in the live audio and video information, determine the object indicated by the preset gesture as the second object.
S372:若第二对象与待处理对象信息匹配,则将第二对象作为目标对象。S372: If the second object matches the to-be-processed object information, use the second object as the target object.
在一些实施方式中,可根据预设手势确定其所指示的图像区域,并对该图像区域的图像进行截取,得到第二对象对应的第二图像,然后通过可通过网络搜索第二图像对应的第二对象信息,若第二对象信息与待处理对象信息匹配,则可判定第二对象与待处理对象信息匹配,则可将第二对象作为目标对象。比如,截取直播用户用手指着、手画圈圈住的图像区域的图像作为第二对象对应的第二图像,若能搜到属于直播用户说出的物品名称下的结果,就将第二对象确定为目标对象,以在后续步骤对目标对象进行变形处理。In some embodiments, the image area indicated by the preset gesture can be determined, and the image of the image area can be intercepted to obtain the second image corresponding to the second object, and then the second image corresponding to the second image can be searched through the network. For the second object information, if the second object information matches the to-be-processed object information, it can be determined that the second object matches the to-be-processed object information, and the second object can be used as the target object. For example, intercept the image of the image area that the live broadcast user points and circles with his hand as the second image corresponding to the second object. Determined as the target object to be deformed in a subsequent step.
S380:对目标对象在直播音视频信息中对应的目标图像进行变形处理,得到变形后的目标图像。S380: Perform deformation processing on a target image corresponding to the target object in the live audio and video information to obtain a deformed target image.
S390:对变形后的目标图像与直播音视频信息中的图像进行合成处理,得到合成音视频信息,以用于播放合成音视频信息。S390: Perform synthesis processing on the deformed target image and the image in the live audio and video information to obtain synthesized audio and video information, which is used for playing the synthesized audio and video information.
需要说明的是,本实施例中未详细描述的部分请参考前述实施例,在此不再赘述。It should be noted that, for the parts that are not described in detail in this embodiment, please refer to the foregoing embodiment, and details are not repeated here.
另外,在一些实施例中,若未检测到直播音视频信息中存在待处理对象信息所指示的第一对象,还可根据直播界面显示的直播内容来进一步确定目标对象,则在直播界面上显示有可用于指示目标对象的目标信息时,可基于该目标信息来确定目标对象。具体地,请参阅图7,其示出了本公开一个示例性实施例提供的图像处理方法中根据直播内容确定目标对象的流程示意图,具体可包括:In addition, in some embodiments, if it is not detected that the first object indicated by the object information to be processed exists in the live audio and video information, the target object can be further determined according to the live content displayed on the live broadcast interface, and the target object is displayed on the live broadcast interface. When there is target information that can be used to indicate the target object, the target object can be determined based on the target information. Specifically, please refer to FIG. 7 , which shows a schematic flowchart of determining a target object according to live content in an image processing method provided by an exemplary embodiment of the present disclosure, which may specifically include:
S410:若未检测到直播音视频信息中存在待处理对象信息所指示的第一对象,则获取当前显示的直播内容。S410: If it is not detected that the first object indicated by the object information to be processed exists in the live audio and video information, obtain the currently displayed live content.
S420:若直播内容中存在待处理对象信息对应的目标信息,则根据目标信息确定目标对象。S420: If there is target information corresponding to the object information to be processed in the live broadcast content, determine the target object according to the target information.
其中,目标信息可包括目标对象对应的物品标识、物品图像中的至少一种。则在未能检测到直播音视频信息中存在待处理对象信息所指示的第一对象时,可识别当前显示的直播内容,根据识别到的其中的目标信息来辅助确定目标对象,从而可提高目标对象的识别率。The target information may include at least one of an item identifier corresponding to the target object and an item image. When it fails to detect that the first object indicated by the object information to be processed exists in the live audio and video information, the currently displayed live content can be identified, and the target object can be determined according to the identified target information, so that the target can be improved. Object recognition rate.
在一些实施方式中,直播界面中可能显示有物品标识、物品图像、购买入口等至少一种与待处理对象信息对应的目标信息,则根据这些目标信息可以在直播音视频信息中找到对应的目标对象。其中,购买入口可以以物品图像的形式显示,物品图像可内置网址(Uniform Resource Locator,URL),则用户点击物品图像可跳转该网址对应的购买页面。In some embodiments, the live interface may display at least one kind of target information corresponding to the object information to be processed, such as an item identifier, an item image, a purchase portal, etc., and then the corresponding target can be found in the live audio and video information according to the target information. object. The purchase portal may be displayed in the form of an item image, and the item image may have a built-in URL (Uniform Resource Locator, URL), and the user can click the item image to jump to the purchase page corresponding to the URL.
作为一种实施方式,直播界面中可显示有商品图像、商品名称,若商品名称与待处理对象信息匹配,例如,待处理对象信息是对象的名称“口红”,而商品名称也包括“口红”,则可认为匹配,并基于商品图像在直播音视频信息中标记对应的图像区域,则该图像区域的对象为目标对象。As an embodiment, a product image and product name can be displayed in the live broadcast interface. If the product name matches the information of the object to be processed, for example, the object information to be processed is the name of the object "lipstick", and the product name also includes "lipstick" , it can be considered a match, and the corresponding image area is marked in the live audio and video information based on the product image, and the object in the image area is the target object.
需要说明的是,基于前述实施例,步骤S410-S420可用于替换图5中的步骤S360-S370,在未检测到直播音视频信息中存在待处理对象信息所指示的第一对象时,根据直播内容来进一步确定目标对象。It should be noted that, based on the foregoing embodiment, steps S410-S420 may be used to replace steps S360-S370 in FIG. 5 . When it is not detected that the first object indicated by the object information to be processed exists in the live audio and video information, according to the live broadcast content to further identify the target audience.
需要说明的是,本实施例中未详细描述的部分请参考前述实施例,在此不再赘述。It should be noted that, for the parts that are not described in detail in this embodiment, please refer to the foregoing embodiment, and details are not repeated here.
请参照图8,本公开一实施例提供的一种图像处理装置的模块框图,本公开实施例的图像处理装置800可以包括:信息获取模块810、目标确定模块820、目标变形模块830以及图像合成模块840,其中:Please refer to FIG. 8 , which is a block diagram of an image processing apparatus provided by an embodiment of the present disclosure. The image processing apparatus 800 in the embodiment of the present disclosure may include: an information acquisition module 810 , a target determination module 820 , a target deformation module 830 , and an image synthesis module Module 840, where:
信息获取模块810,用于获取实时采集的直播音视频信息;An information acquisition module 810, configured to acquire live audio and video information collected in real time;
目标确定模块820,用于识别直播音视频信息中直播用户的输入信息,并根据输入信息在直播音视频信息中确定待处理的目标对象;The target determination module 820 is used to identify the input information of the live user in the live audio and video information, and determine the target object to be processed in the live audio and video information according to the input information;
目标变形模块830,用于对目标对象在直播音视频信息中对应的目标图像进行变形处理,得到变形后的目标图像;The target deformation module 830 is configured to perform deformation processing on the target image corresponding to the target object in the live audio and video information to obtain the deformed target image;
图像合成模块840,用于对变形后的目标图像与直播音视频信息中的图像进行合成处理,得到合成音视频信息,以用于播放合成音视频信息。The image synthesis module 840 is used for synthesizing the deformed target image and the image in the live audio and video information to obtain the synthesized audio and video information, which is used for playing the synthesized audio and video information.
在一实施例中,输入信息包括语音信息,目标确定模块820可包括: 语音识别子模块、对象信息确定子模块以及目标对象确定子模块,其中:In one embodiment, the input information includes voice information, and the target determination module 820 may include: a voice recognition submodule, an object information determination submodule, and a target object determination submodule, wherein:
语音识别子模块,用于对直播音视频信息中的语音信息进行语音识别,得到语音识别结果;The speech recognition sub-module is used to perform speech recognition on the speech information in the live audio and video information to obtain the speech recognition result;
对象信息确定子模块,用于基于语音识别结果确定待处理对象信息;an object information determination submodule, used for determining the object information to be processed based on the speech recognition result;
目标对象确定子模块,用于在直播音视频信息中,将待处理对象信息对应的对象确定为目标对象。The target object determination submodule is used for determining the object corresponding to the object information to be processed as the target object in the live audio and video information.
在一实施例中,目标对象确定子模块可包括:第一对象确定单元,用于若检测到直播音视频信息中存在待处理对象信息所指示的第一对象,则将第一对象确定为目标对象。In one embodiment, the target object determination sub-module may include: a first object determination unit, configured to determine the first object as the target if it is detected that the first object indicated by the object information to be processed exists in the live audio and video information. object.
在一实施例中,目标对象确定子模块可包括:图像识别单元以及手势确定单元,其中:In one embodiment, the target object determination submodule may include: an image recognition unit and a gesture determination unit, wherein:
图像识别单元,用于若未检测到直播音视频信息中存在待处理对象信息所指示的第一对象,则对直播音视频信息进行图像识别处理;an image recognition unit, configured to perform image recognition processing on the live audio and video information if it is not detected that the first object indicated by the to-be-processed object information exists in the live audio and video information;
手势确定单元,用于若识别到直播音视频信息中存在预设手势,则将预设手势所指示的第二对象作为目标对象。The gesture determination unit is configured to use the second object indicated by the preset gesture as the target object if it is recognized that there is a preset gesture in the live audio and video information.
在一实施例中,手势确定单元可包括:第二对象确定子单元以及目标对象确定子单元,其中:In one embodiment, the gesture determination unit may include: a second object determination subunit and a target object determination subunit, wherein:
第二对象确定子单元,用于若识别到直播音视频信息中存在预设手势,则将预设手势所指示的对象确定为第二对象;The second object determination subunit is configured to determine the object indicated by the preset gesture as the second object if it is recognized that there is a preset gesture in the live audio and video information;
目标对象确定子单元,用于若第二对象与待处理对象信息匹配,则将第二对象作为目标对象。The target object determination subunit is configured to use the second object as the target object if the second object matches the information of the object to be processed.
在一实施例中,目标对象确定子模块可包括:直播内容获取单元以及目标信息确定单元,其中:In one embodiment, the target object determination submodule may include: a live content acquisition unit and a target information determination unit, wherein:
直播内容获取单元,用于若未检测到直播音视频信息中存在待处理对象信息所指示的第一对象,则获取当前显示的直播内容;A live broadcast content acquisition unit, configured to acquire the currently displayed live broadcast content if it is not detected that the first object indicated by the object information to be processed exists in the live broadcast audio and video information;
目标信息确定单元,用于若直播内容中存在待处理对象信息对应的目标信息,则根据目标信息确定目标对象,目标信息包括目标对象对应的物品标识、图像中的至少一种。The target information determining unit is configured to determine the target object according to the target information if target information corresponding to the object information to be processed exists in the live broadcast content, and the target information includes at least one of an item identifier and an image corresponding to the target object.
在一实施例中,目标变形模块830可包括:放大处理子模块,用于对 目标对象在直播音视频信息中对应的目标图像进行放大处理,得到放大后的目标图像作为变形后的目标图像。In one embodiment, the target deformation module 830 may include: an enlargement processing sub-module, configured to perform enlargement processing on the target image corresponding to the target object in the live audio and video information, and obtain the enlarged target image as the deformed target image.
在一实施例中,输入信息包括语音信息、文本信息、触控信息、视觉信息中的至少一个。In one embodiment, the input information includes at least one of voice information, text information, touch information, and visual information.
本公开实施例的图像处理装置可执行本公开的实施例所提供的一种图像处理方法,其实现原理相类似,本公开各实施例中的图像处理装置中的各模块所执行的动作是与本公开各实施例中的图像处理方法中的步骤相对应的,对于图像处理装置的各模块的详细功能描述具体可以参见前文中所示的对应的图像处理方法中的描述,此处不再赘述。The image processing apparatus in the embodiments of the present disclosure can execute an image processing method provided by the embodiments of the present disclosure, and the implementation principle is similar. The actions performed by each module in the image processing apparatus in the embodiments of the present disclosure are the same as Corresponding to the steps in the image processing methods in the embodiments of the present disclosure, for the detailed functional description of each module of the image processing apparatus, please refer to the descriptions in the corresponding image processing methods shown above, which will not be repeated here. .
下面参考图9,其示出了适于用来实现本公开实施例的电子设备900的结构框图。本公开实施例中的电子设备可以包括但不限于诸如计算机等的设备。图9示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。Referring next to FIG. 9 , it shows a structural block diagram of an electronic device 900 suitable for implementing embodiments of the present disclosure. The electronic device in the embodiment of the present disclosure may include, but is not limited to, a device such as a computer. The electronic device shown in FIG. 9 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present disclosure.
电子设备900包括:存储器以及处理器,其中,这里的处理器可以称为下文的处理装置901,存储器可以包括下文中的只读存储器(ROM)902、随机访问存储器(RAM)903以及存储装置908中的至少一项,具体如下所示:The electronic device 900 includes: a memory and a processor, where the processor here may be referred to as a processing device 901 hereinafter, and the memory may include a read-only memory (ROM) 902, a random access memory (RAM) 903, and a storage device 908 hereinafter At least one of the following:
如图9所示,电子设备900可以包括处理装置(例如中央处理器、图形处理器等)901,其可以根据存储在只读存储器(ROM)902中的程序或者从存储装置908加载到随机访问存储器(RAM)903中的程序而执行各种适当的动作和处理。在RAM 903中,还存储有电子设备900操作所需的各种程序和数据。处理装置901、ROM 902以及RAM 903通过总线904彼此相连。输入/输出(I/O)接口905也连接至总线904。As shown in FIG. 9 , an electronic device 900 may include a processing device (eg, a central processing unit, a graphics processor, etc.) 901 that may be loaded into random access according to a program stored in a read only memory (ROM) 902 or from a storage device 908 Various appropriate actions and processes are executed by the programs in the memory (RAM) 903 . In the RAM 903, various programs and data necessary for the operation of the electronic device 900 are also stored. The processing device 901, the ROM 902, and the RAM 903 are connected to each other through a bus 904. An input/output (I/O) interface 905 is also connected to bus 904 .
通常,以下装置可以连接至I/O接口905:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置906;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置907;包括例如磁带、硬盘等的存储装置908;以及通信装置909。通信装置909可以允许电子设备900与其他设备进行无线或有线通信以交换数据。虽然图9示出了具有各种装置的电子设备900,但是应理解的是,并不要求实施或具 备所有示出的装置。可以替代地实施或具备更多或更少的装置。Typically, the following devices can be connected to the I/O interface 905: input devices 906 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration An output device 907 such as a computer; a storage device 908 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 909 . The communication means 909 may allow the electronic device 900 to communicate wirelessly or by wire with other devices to exchange data. While Figure 9 shows electronic device 900 having various means, it should be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读存储介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置909从网络上被下载和安装,或者从存储装置908被安装,或者从ROM 902被安装。在该计算机程序被处理装置901执行时,执行本公开实施例的方法中限定的上述功能。In particular, according to embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer-readable storage medium, the computer program containing program code for performing the method illustrated in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network via the communication device 909, or from the storage device 908, or from the ROM 902. When the computer program is executed by the processing apparatus 901, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed.
需要说明的是,本公开上述的计算机可读存储介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读存储介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读存储介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。It should be noted that the above-mentioned computer-readable storage medium of the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. In this disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present disclosure, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable storage medium, other than a computer-readable storage medium, that can send, propagate, or transport a computer-readable signal medium for use by or in connection with the instruction execution system, apparatus, or device. program. Program code embodied on a computer-readable storage medium may be transmitted using any suitable medium including, but not limited to, electrical wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.
在一些实施方式中,客户端、服务器可以利用诸如HTTP(HyperText Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如, 通信网络)互连。通信网络的示例包括局域网(“LAN”),广域网(“WAN”),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。In some embodiments, clients and servers can communicate using any currently known or future developed network protocols such as HTTP (HyperText Transfer Protocol), and can communicate with digital data in any form or medium. Communications (eg, communications networks) are interconnected. Examples of communication networks include local area networks ("LAN"), wide area networks ("WAN"), the Internet (eg, the Internet), and peer-to-peer networks (eg, ad hoc peer-to-peer networks), as well as any currently known or future development network of.
上述计算机可读存储介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。The above-mentioned computer-readable storage medium may be included in the above-mentioned electronic device; or may exist alone without being assembled into the electronic device.
上述计算机可读存储介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备执行以下步骤:获取实时采集的直播音视频信息;识别所述直播音视频信息中直播用户的输入信息,并根据所述输入信息在直播音视频信息中确定待处理的目标对象;对所述目标对象在所述直播音视频信息中对应的目标图像进行变形处理,得到变形后的目标图像;对所述变形后的目标图像与所述直播音视频信息中的图像进行合成处理,得到合成音视频信息,以用于播放所述合成音视频信息。The above-mentioned computer-readable storage medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device is made to perform the following steps: acquiring live audio and video information collected in real time; identifying the live audio and video information The input information of the live broadcast user in the video information, and the target object to be processed is determined in the live broadcast audio and video information according to the input information; the target image corresponding to the target object in the live broadcast audio and video information is deformed to obtain The deformed target image; performing synthesis processing on the deformed target image and the image in the live audio and video information to obtain synthesized audio and video information, which is used for playing the synthesized audio and video information.
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for performing operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and This includes conventional procedural programming languages - such as the "C" language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、 以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or operations. , or can be implemented in a combination of dedicated hardware and computer instructions.
描述于本公开实施例中所涉及到的模块或单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,模块或单元的名称在某种情况下并不构成对该单元本身的限定,例如,显示模块还可以被描述为“用于显示资源上传界面的模块”。The modules or units involved in the embodiments of the present disclosure may be implemented in software or hardware. Wherein, the name of the module or unit does not constitute a limitation of the unit itself under certain circumstances, for example, the display module can also be described as "a module for displaying a resource uploading interface".
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), Systems on Chips (SOCs), Complex Programmable Logical Devices (CPLDs) and more.
在本公开的上下文中,计算机可读存储介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。计算机可读存储介质可以是机器可读信号介质或机器可读储存介质。计算机可读存储介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of this disclosure, a computer-readable storage medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device. The computer-readable storage medium may be a machine-readable signal medium or a machine-readable storage medium. Computer-readable storage media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
根据本公开的一个或多个实施例,提供了一种图像处理方法,该方法包括:获取实时采集的直播音视频信息;识别所述直播音视频信息中直播用户的输入信息,并根据所述输入信息在直播音视频信息中确定待处理的目标对象;对所述目标对象在所述直播音视频信息中对应的目标图像进行变形处理,得到变形后的目标图像;对所述变形后的目标图像与所述直播音视频信息中的图像进行合成处理,得到合成音视频信息,以用于播放所述合成音视频信息。According to one or more embodiments of the present disclosure, an image processing method is provided, the method includes: acquiring live audio and video information collected in real time; identifying input information of a live user in the live audio and video information, and The input information determines the target object to be processed in the live audio and video information; performs deformation processing on the target image corresponding to the target object in the live audio and video information to obtain a deformed target image; The image is synthesized with the image in the live audio and video information to obtain synthesized audio and video information, which is used for playing the synthesized audio and video information.
在一实施例中,所述输入信息包括语音信息,所述识别所述直播音视 频信息中直播用户的输入信息,并根据所述输入信息在直播音视频信息中确定待处理的目标对象,包括:对所述直播音视频信息中的语音信息进行语音识别,得到语音识别结果;基于所述语音识别结果确定待处理对象信息;在所述直播音视频信息中,将所述待处理对象信息对应的对象确定为所述目标对象。In one embodiment, the input information includes voice information, the identifying the input information of the live user in the live audio and video information, and determining the target object to be processed in the live audio and video information according to the input information, including: : perform voice recognition on the voice information in the live audio and video information to obtain a voice recognition result; determine the object information to be processed based on the voice recognition result; in the live audio and video information, map the object information to be processed corresponding to The object is determined as the target object.
在一实施例中,所述在所述直播音视频信息中,将所述待处理对象信息对应的对象确定为所述目标对象,包括:若检测到所述直播音视频信息中存在所述待处理对象信息所指示的第一对象,则将所述第一对象确定为所述目标对象。In an embodiment, determining the object corresponding to the object information to be processed as the target object in the live audio and video information includes: if it is detected that the to-be-processed audio and video information exists in the live audio and video information; The first object indicated by the object information is processed, and the first object is determined as the target object.
在一实施例中,所述在所述直播音视频信息中,将所述待处理对象信息对应的对象确定为所述目标对象,还包括:若未检测到所述直播音视频信息中存在所述待处理对象信息所指示的第一对象,则对所述直播音视频信息进行图像识别处理;若识别到所述直播音视频信息中存在预设手势,则将所述预设手势所指示的第二对象作为所述目标对象。In an embodiment, determining the object corresponding to the object information to be processed as the target object in the live audio and video information, further comprising: if it is not detected that there is any object in the live audio and video information If the first object indicated by the object information to be processed is detected, image recognition processing is performed on the live audio and video information; if it is recognized that there is a preset gesture in the live audio and video information, the The second object serves as the target object.
在一实施例中,所述若识别到所述直播音视频信息中存在预设手势,则将所述预设手势所指示的第二对象作为所述目标对象,包括:若识别到所述直播音视频信息中存在预设手势,则将所述预设手势所指示的对象确定为第二对象;若所述第二对象与所述待处理对象信息匹配,则将所述第二对象作为所述目标对象。In one embodiment, if it is recognized that there is a preset gesture in the live audio and video information, the second object indicated by the preset gesture is used as the target object, including: if the live broadcast is recognized If there is a preset gesture in the audio and video information, the object indicated by the preset gesture is determined as the second object; if the second object matches the to-be-processed object information, the second object is determined as the second object. describe the target object.
在一实施例中,所述在所述直播音视频信息中,将所述待处理对象信息对应的对象确定为所述目标对象,还包括:若未检测到所述直播音视频信息中存在所述待处理对象信息所指示的第一对象,则获取当前显示的直播内容;若所述直播内容中存在所述待处理对象信息对应的目标信息,则根据所述目标信息确定所述目标对象,所述目标信息包括所述目标对象对应的物品标识、图像中的至少一种。In an embodiment, determining the object corresponding to the object information to be processed as the target object in the live audio and video information, further comprising: if it is not detected that there is any object in the live audio and video information If the first object indicated by the object information to be processed is obtained, the currently displayed live broadcast content is obtained; if there is target information corresponding to the object information to be processed in the live broadcast content, the target object is determined according to the target information, The target information includes at least one of an item identifier and an image corresponding to the target object.
在一实施例中,所述对所述目标对象在所述直播音视频信息中对应的目标图像进行变形处理,得到变形后的目标图像,包括:对所述目标对象在所述直播音视频信息中对应的目标图像进行放大处理,得到放大后的目标图像作为所述变形后的目标图像。In one embodiment, performing deformation processing on the target image corresponding to the target object in the live audio and video information to obtain the deformed target image includes: performing deformation processing on the target object in the live audio and video information. The corresponding target image in the image is enlarged, and the enlarged target image is obtained as the deformed target image.
在一实施例中,所述输入信息包括语音信息、文本信息、触控信息、视觉信息中的至少一个。In one embodiment, the input information includes at least one of voice information, text information, touch information, and visual information.
根据本公开的一个或多个实施例,提供了一种图像处理装置,该图像处理装置包括:信息获取模块,用于获取实时采集的直播音视频信息;目标确定模块,用于识别所述直播音视频信息中直播用户的输入信息,并根据所述输入信息在直播音视频信息中确定待处理的目标对象;目标变形模块,用于对所述目标对象在所述直播音视频信息中对应的目标图像进行变形处理,得到变形后的目标图像;图像合成模块,用于对所述变形后的目标图像与所述直播音视频信息中的图像进行合成处理,得到合成音视频信息,以用于播放所述合成音视频信息。According to one or more embodiments of the present disclosure, an image processing apparatus is provided, the image processing apparatus includes: an information acquisition module for acquiring live audio and video information collected in real time; a target determination module for identifying the live broadcast The input information of the live broadcast user in the audio and video information, and the target object to be processed is determined in the live audio and video information according to the input information; the target deformation module is used for the corresponding target object in the live audio and video information. The target image is deformed to obtain a deformed target image; an image synthesis module is used to synthesize the deformed target image and the image in the live audio and video information to obtain synthesized audio and video information for use in Play the synthesized audio and video information.
在一实施例中,所述输入信息包括语音信息,目标确定模块可包括:语音识别子模块、对象信息确定子模块以及目标对象确定子模块,其中:语音识别子模块,用于对所述直播音视频信息中的语音信息进行语音识别,得到语音识别结果;对象信息确定子模块,用于基于所述语音识别结果确定待处理对象信息;目标对象确定子模块,用于在所述直播音视频信息中,将所述待处理对象信息对应的对象确定为所述目标对象。In one embodiment, the input information includes voice information, and the target determination module may include: a voice recognition sub-module, an object information determination sub-module, and a target object determination sub-module, wherein: a voice recognition sub-module is used for the live broadcast. The voice information in the audio and video information is subjected to voice recognition, and a voice recognition result is obtained; an object information determination sub-module is used to determine the object information to be processed based on the voice recognition result; a target object determination sub-module is used for the live audio and video. In the information, the object corresponding to the to-be-processed object information is determined as the target object.
在一实施例中,目标对象确定子模块可包括:第一对象确定单元,用于若检测到所述直播音视频信息中存在所述待处理对象信息所指示的第一对象,则将所述第一对象确定为所述目标对象。In an embodiment, the target object determination sub-module may include: a first object determination unit, configured to, if it is detected that the first object indicated by the to-be-processed object information exists in the live audio and video information, determine the The first object is determined as the target object.
在一实施例中,目标对象确定子模块可包括:图像识别单元以及手势确定单元,其中:图像识别单元,用于若未检测到所述直播音视频信息中存在所述待处理对象信息所指示的第一对象,则对所述直播音视频信息进行图像识别处理;手势确定单元,用于若识别到所述直播音视频信息中存在预设手势,则将所述预设手势所指示的第二对象作为所述目标对象。In one embodiment, the target object determination sub-module may include: an image recognition unit and a gesture determination unit, wherein: the image recognition unit is configured to, if it is not detected that the live audio and video information exists as indicated by the to-be-processed object information the first object of the live audio and video information, then perform image recognition processing on the live audio and video information; the gesture determination unit is configured to recognize that there is a preset gesture in the live audio and video information, then identify the first object indicated by the preset gesture Two objects are used as the target object.
在一实施例中,手势确定单元可包括:第二对象确定子单元以及目标对象确定子单元,其中:第二对象确定子单元,用于若识别到所述直播音视频信息中存在预设手势,则将所述预设手势所指示的对象确定为第二对象;目标对象确定子单元,用于若所述第二对象与所述待处理对象信息匹配,则将所述第二对象作为所述目标对象。In one embodiment, the gesture determination unit may include: a second object determination subunit and a target object determination subunit, wherein: the second object determination subunit is used for if it is recognized that there is a preset gesture in the live audio and video information , the object indicated by the preset gesture is determined as the second object; the target object determination subunit is configured to, if the second object matches the to-be-processed object information, determine the second object as the object to be processed describe the target object.
在一实施例中,目标对象确定子模块可包括:直播内容获取单元以及目标信息确定单元,其中:直播内容获取单元,用于若未检测到所述直播音视频信息中存在所述待处理对象信息所指示的第一对象,则获取当前显示的直播内容;目标信息确定单元,用于若所述直播内容中存在所述待处理对象信息对应的目标信息,则根据所述目标信息确定所述目标对象,所述目标信息包括所述目标对象对应的物品标识、图像中的至少一种。In one embodiment, the target object determination sub-module may include: a live broadcast content acquisition unit and a target information determination unit, wherein: the live broadcast content acquisition unit is used for, if it is not detected that the object to be processed exists in the live broadcast audio and video information The first object indicated by the information, obtain the currently displayed live content; a target information determination unit is configured to determine the target information according to the target information if there is target information corresponding to the object information to be processed in the live broadcast content A target object, the target information includes at least one of an item identifier and an image corresponding to the target object.
在一实施例中,目标变形模块可包括:放大处理子模块,用于对所述目标对象在所述直播音视频信息中对应的目标图像进行放大处理,得到放大后的目标图像作为所述变形后的目标图像。In one embodiment, the target deformation module may include: an enlargement processing sub-module, configured to perform enlargement processing on the target image corresponding to the target object in the live audio and video information, and obtain the enlarged target image as the deformation. post target image.
在一实施例中,所述输入信息包括语音信息、文本信息、触控信息、视觉信息中的至少一个。In one embodiment, the input information includes at least one of voice information, text information, touch information, and visual information.
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above description is merely a preferred embodiment of the present disclosure and an illustration of the technical principles employed. Those skilled in the art should understand that the scope of the disclosure involved in the present disclosure is not limited to the technical solutions formed by the specific combination of the above-mentioned technical features, and should also cover, without departing from the above-mentioned disclosed concept, the technical solutions formed by the above-mentioned technical features or Other technical solutions formed by any combination of its equivalent features. For example, a technical solution is formed by replacing the above features with the technical features disclosed in the present disclosure (but not limited to) with similar functions.
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。Additionally, although operations are depicted in a particular order, this should not be construed as requiring that the operations be performed in the particular order shown or in a sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, although the above discussion contains several implementation-specific details, these should not be construed as limitations on the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Although the subject matter has been described in language specific to structural features and/or logical acts of method, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are merely example forms of implementing the claims.

Claims (11)

  1. 一种图像处理方法,其特征在于,包括:An image processing method, comprising:
    获取实时采集的直播音视频信息;Obtain live audio and video information collected in real time;
    识别所述直播音视频信息中直播用户的输入信息,并根据所述输入信息在直播音视频信息中确定待处理的目标对象;Identify the input information of the live user in the live audio and video information, and determine the target object to be processed in the live audio and video information according to the input information;
    对所述目标对象在所述直播音视频信息中对应的目标图像进行变形处理,得到变形后的目标图像;Perform deformation processing on the target image corresponding to the target object in the live audio and video information to obtain a deformed target image;
    对所述变形后的目标图像与所述直播音视频信息中的图像进行合成处理,得到合成音视频信息,以用于播放所述合成音视频信息。The deformed target image and the image in the live audio and video information are synthesized to obtain synthesized audio and video information, which is used for playing the synthesized audio and video information.
  2. 根据权利要求1所述的方法,其特征在于,所述输入信息包括语音信息,所述识别所述直播音视频信息中直播用户的输入信息,并根据所述输入信息在直播音视频信息中确定待处理的目标对象,包括:The method according to claim 1, wherein the input information includes voice information, and the identifying the input information of the live user in the live audio and video information, and determining in the live audio and video information according to the input information Target objects to be processed, including:
    对所述直播音视频信息中的语音信息进行语音识别,得到语音识别结果;Perform voice recognition on the voice information in the live audio and video information to obtain a voice recognition result;
    基于所述语音识别结果确定待处理对象信息;Determine the object information to be processed based on the speech recognition result;
    在所述直播音视频信息中,将所述待处理对象信息对应的对象确定为所述目标对象。In the live audio and video information, the object corresponding to the to-be-processed object information is determined as the target object.
  3. 根据权利要求2所述的方法,其特征在于,所述在所述直播音视频信息中,将所述待处理对象信息对应的对象确定为所述目标对象,包括:The method according to claim 2, wherein, in the live audio and video information, determining the object corresponding to the to-be-processed object information as the target object, comprising:
    若检测到所述直播音视频信息中存在所述待处理对象信息所指示的第一对象,则将所述第一对象确定为所述目标对象。If it is detected that the first object indicated by the to-be-processed object information exists in the live audio and video information, the first object is determined as the target object.
  4. 根据权利要求2所述的方法,其特征在于,所述在所述直播音视频信息中,将所述待处理对象信息对应的对象确定为所述目标对象,还包括:The method according to claim 2, wherein, in the live audio and video information, determining the object corresponding to the to-be-processed object information as the target object, further comprising:
    若未检测到所述直播音视频信息中存在所述待处理对象信息所指示的第一对象,则对所述直播音视频信息进行图像识别处理;If it is not detected that the first object indicated by the object information to be processed exists in the live audio and video information, image recognition processing is performed on the live audio and video information;
    若识别到所述直播音视频信息中存在预设手势,则将所述预设手势所指示的第二对象作为所述目标对象。If it is recognized that there is a preset gesture in the live audio and video information, the second object indicated by the preset gesture is used as the target object.
  5. 根据权利要求4所述的图像处理方法,其特征在于,所述若识别到所述直播音视频信息中存在预设手势,则将所述预设手势所指示的第二对象作为待处理的目标对象,包括:The image processing method according to claim 4, wherein if it is recognized that there is a preset gesture in the live audio and video information, the second object indicated by the preset gesture is used as a target to be processed objects, including:
    若识别到所述直播音视频信息中存在预设手势,则将所述预设手势所指示的对象确定为第二对象;If it is recognized that there is a preset gesture in the live audio and video information, the object indicated by the preset gesture is determined as the second object;
    若所述第二对象与所述待处理对象信息匹配,则将所述第二对象作为所述目标对象。If the second object matches the to-be-processed object information, the second object is used as the target object.
  6. 根据权利要求2所述的图像处理方法,其特征在于,所述在所述直播音视频信息中,将所述待处理对象信息对应的对象确定为所述目标对象,还包括:The image processing method according to claim 2, wherein, in the live audio and video information, determining the object corresponding to the to-be-processed object information as the target object, further comprising:
    若未检测到所述直播音视频信息中存在所述待处理对象信息所指示的第一对象,则获取当前显示的直播内容;If it is not detected that the first object indicated by the to-be-processed object information exists in the live audio and video information, obtain the currently displayed live content;
    若所述直播内容中存在所述待处理对象信息对应的目标信息,则根据所述目标信息确定所述目标对象,所述目标信息包括所述目标对象对应的物品标识、图像中的至少一种。If there is target information corresponding to the object information to be processed in the live broadcast content, the target object is determined according to the target information, and the target information includes at least one of an item identifier and an image corresponding to the target object .
  7. 根据权利要求1至6任一项所述的图像处理方法,其特征在于,所述对所述目标对象在所述直播音视频信息中对应的目标图像进行变形处理,得到变形后的目标图像,包括:The image processing method according to any one of claims 1 to 6, characterized in that, performing deformation processing on the target image corresponding to the target object in the live audio and video information to obtain the deformed target image, include:
    对所述目标对象在所述直播音视频信息中对应的目标图像进行放大处理,得到放大后的目标图像作为所述变形后的目标图像。Enlarging the target image corresponding to the target object in the live audio and video information to obtain the enlarged target image as the deformed target image.
  8. 根据权利要求1所述的图像处理方法,其特征在于,所述输入信息包括语音信息、文本信息、触控信息、视觉信息中的至少一个。The image processing method according to claim 1, wherein the input information includes at least one of voice information, text information, touch information, and visual information.
  9. 一种图像处理装置,其特征在于,包括:An image processing device, comprising:
    信息获取模块,用于获取实时采集的直播音视频信息;The information acquisition module is used to acquire the live audio and video information collected in real time;
    目标确定模块,用于识别所述直播音视频信息中直播用户的输入信息,并根据所述输入信息在直播音视频信息中确定待处理的目标对象;A target determination module, configured to identify the input information of the live user in the live audio and video information, and determine the target object to be processed in the live audio and video information according to the input information;
    目标变形模块,用于对所述目标对象在所述直播音视频信息中对应的目标图像进行变形处理,得到变形后的目标图像;a target deformation module, configured to perform deformation processing on the target image corresponding to the target object in the live audio and video information to obtain a deformed target image;
    图像合成模块,用于对所述变形后的目标图像与所述直播音视频信息 中的图像进行合成处理,得到合成音视频信息,以用于播放所述合成音视频信息。An image synthesis module, configured to perform synthesis processing on the deformed target image and the image in the live audio and video information to obtain synthesized audio and video information for playing the synthesized audio and video information.
  10. 一种电子设备,其特征在于,包括:An electronic device, comprising:
    一个或多个处理器;one or more processors;
    存储器;memory;
    一个或多个计算机程序,其中,所述一个或多个计算机程序被存储在所述存储器中并被配置为由所述一个或多个处理器执行,所述一个或多个计算机程序配置用于:执行如权利要求1-8任一项所述的图像处理方法。one or more computer programs, wherein the one or more computer programs are stored in the memory and configured to be executed by the one or more processors, the one or more computer programs are configured to : Execute the image processing method according to any one of claims 1-8.
  11. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质用于存储计算机程序,所述计算机程序被处理器调用执行如权利要求1-8中任一项所述的图像处理方法。A computer-readable storage medium, wherein the computer-readable storage medium is used to store a computer program, and the computer program is invoked by a processor to execute the image processing method according to any one of claims 1-8 .
PCT/CN2021/119567 2020-10-19 2021-09-22 Image processing method and apparatus, electronic device and computer-readable storage medium WO2022083383A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011119916.0 2020-10-19
CN202011119916.0A CN112261424B (en) 2020-10-19 2020-10-19 Image processing method, image processing device, electronic equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
WO2022083383A1 true WO2022083383A1 (en) 2022-04-28

Family

ID=74243898

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/119567 WO2022083383A1 (en) 2020-10-19 2021-09-22 Image processing method and apparatus, electronic device and computer-readable storage medium

Country Status (2)

Country Link
CN (1) CN112261424B (en)
WO (1) WO2022083383A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115205637A (en) * 2022-09-19 2022-10-18 山东世纪矿山机电有限公司 Intelligent identification method for mine car materials
CN115460429A (en) * 2022-09-06 2022-12-09 河北先河环保科技股份有限公司 Method for monitoring and supervising water quality sampling, electronic equipment and storage medium

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112261424B (en) * 2020-10-19 2022-11-18 北京字节跳动网络技术有限公司 Image processing method, image processing device, electronic equipment and computer readable storage medium
CN114915798A (en) * 2021-02-08 2022-08-16 阿里巴巴集团控股有限公司 Real-time video generation method, multi-camera live broadcast method and device
CN115086686A (en) * 2021-03-11 2022-09-20 北京有竹居网络技术有限公司 Video processing method and related device
CN114501041B (en) 2021-04-06 2023-07-14 抖音视界有限公司 Special effect display method, device, equipment and storage medium
CN112804585A (en) * 2021-04-13 2021-05-14 杭州次元岛科技有限公司 Processing method and device for realizing intelligent product display in live broadcast process
CN113709545A (en) * 2021-04-13 2021-11-26 腾讯科技(深圳)有限公司 Video processing method and device, computer equipment and storage medium
CN113269785A (en) * 2021-05-13 2021-08-17 北京字节跳动网络技术有限公司 Image processing method, apparatus, storage medium, and program product
CN113286160A (en) * 2021-05-19 2021-08-20 Oppo广东移动通信有限公司 Video processing method, video processing device, electronic equipment and storage medium
CN113901785A (en) * 2021-09-29 2022-01-07 联想(北京)有限公司 Marking method and electronic equipment
CN114245193A (en) * 2021-12-21 2022-03-25 维沃移动通信有限公司 Display control method and device and electronic equipment
CN114928768A (en) * 2022-06-10 2022-08-19 北京百度网讯科技有限公司 Live broadcast information pushing method, device, system, electronic equipment and computer medium
CN118264858A (en) * 2024-05-29 2024-06-28 深圳爱图仕创新科技股份有限公司 Data processing method, device, computer equipment and computer readable storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105828090A (en) * 2016-03-22 2016-08-03 乐视网信息技术(北京)股份有限公司 Panorama live broadcasting method and device
CN106792092A (en) * 2016-12-19 2017-05-31 广州虎牙信息科技有限公司 Live video flow point mirror display control method and its corresponding device
CN109284053A (en) * 2018-08-23 2019-01-29 北京达佳互联信息技术有限公司 Comment information display methods and device, mobile terminal and storage medium
CN110139161A (en) * 2018-02-02 2019-08-16 阿里巴巴集团控股有限公司 Information processing method and device in live streaming
US20190311709A1 (en) * 2015-09-02 2019-10-10 Oath Inc. Computerized system and method for formatted transcription of multimedia content
CN110324648A (en) * 2019-07-17 2019-10-11 咪咕文化科技有限公司 Live broadcast display method and system
CN111353839A (en) * 2018-12-21 2020-06-30 阿里巴巴集团控股有限公司 Commodity information processing method, method and device for live broadcasting of commodities and electronic equipment
CN112261424A (en) * 2020-10-19 2021-01-22 北京字节跳动网络技术有限公司 Image processing method, image processing device, electronic equipment and computer readable storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102055731B (en) * 2009-10-27 2015-11-25 中兴通讯股份有限公司 IVVR Menu Generating System and method
CN106412681B (en) * 2015-07-31 2019-12-24 腾讯科技(深圳)有限公司 Live bullet screen video broadcasting method and device
CN111464827A (en) * 2020-04-20 2020-07-28 玉环智寻信息技术有限公司 Data processing method and device, computing equipment and storage medium
CN111757138A (en) * 2020-07-02 2020-10-09 广州博冠光电科技股份有限公司 Close-up display method and device based on single-shot live video

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190311709A1 (en) * 2015-09-02 2019-10-10 Oath Inc. Computerized system and method for formatted transcription of multimedia content
CN105828090A (en) * 2016-03-22 2016-08-03 乐视网信息技术(北京)股份有限公司 Panorama live broadcasting method and device
CN106792092A (en) * 2016-12-19 2017-05-31 广州虎牙信息科技有限公司 Live video flow point mirror display control method and its corresponding device
CN110139161A (en) * 2018-02-02 2019-08-16 阿里巴巴集团控股有限公司 Information processing method and device in live streaming
CN109284053A (en) * 2018-08-23 2019-01-29 北京达佳互联信息技术有限公司 Comment information display methods and device, mobile terminal and storage medium
CN111353839A (en) * 2018-12-21 2020-06-30 阿里巴巴集团控股有限公司 Commodity information processing method, method and device for live broadcasting of commodities and electronic equipment
CN110324648A (en) * 2019-07-17 2019-10-11 咪咕文化科技有限公司 Live broadcast display method and system
CN112261424A (en) * 2020-10-19 2021-01-22 北京字节跳动网络技术有限公司 Image processing method, image processing device, electronic equipment and computer readable storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115460429A (en) * 2022-09-06 2022-12-09 河北先河环保科技股份有限公司 Method for monitoring and supervising water quality sampling, electronic equipment and storage medium
CN115460429B (en) * 2022-09-06 2024-03-01 河北先河环保科技股份有限公司 Method, electronic equipment and storage medium for monitoring and supervising water quality sampling
CN115205637A (en) * 2022-09-19 2022-10-18 山东世纪矿山机电有限公司 Intelligent identification method for mine car materials

Also Published As

Publication number Publication date
CN112261424B (en) 2022-11-18
CN112261424A (en) 2021-01-22

Similar Documents

Publication Publication Date Title
WO2022083383A1 (en) Image processing method and apparatus, electronic device and computer-readable storage medium
WO2021082760A1 (en) Virtual image generation method, device, terminal and storage medium
CN109168026B (en) Instant video display method and device, terminal equipment and storage medium
WO2022083230A1 (en) Screen display method, apparatus, electronic device, and computer-readable medium
JP2023547917A (en) Image segmentation method, device, equipment and storage medium
US12008167B2 (en) Action recognition method and device for target object, and electronic apparatus
WO2022171024A1 (en) Image display method and apparatus, and device and medium
WO2021254502A1 (en) Target object display method and apparatus and electronic device
WO2023125374A1 (en) Image processing method and apparatus, electronic device, and storage medium
CN110796664B (en) Image processing method, device, electronic equipment and computer readable storage medium
CN112051961A (en) Virtual interaction method and device, electronic equipment and computer readable storage medium
US20220358662A1 (en) Image generation method and device
US12019669B2 (en) Method, apparatus, device, readable storage medium and product for media content processing
WO2022233223A1 (en) Image splicing method and apparatus, and device and medium
WO2023138441A1 (en) Video generation method and apparatus, and device and storage medium
US20230316529A1 (en) Image processing method and apparatus, device and storage medium
WO2023165515A1 (en) Photographing method and apparatus, electronic device, and storage medium
WO2020034981A1 (en) Method for generating encoded information and method for recognizing encoded information
WO2023226628A1 (en) Image display method and apparatus, and electronic device and storage medium
WO2022170982A1 (en) Image processing method and apparatus, image generation method and apparatus, device, and medium
WO2024165010A1 (en) Information generation method and apparatus, information display method and apparatus, device and storage medium
CN112785669B (en) Virtual image synthesis method, device, equipment and storage medium
WO2024131652A1 (en) Special effect processing method and apparatus, and electronic device and storage medium
WO2021227953A1 (en) Image special effect configuration method, image recognition method, apparatuses, and electronic device
WO2024094158A1 (en) Special effect processing method and apparatus, device, and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21881802

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21881802

Country of ref document: EP

Kind code of ref document: A1