WO2022083383A1 - Procédé et appareil de traitement d'images, dispositif électronique et support de stockage lisible par ordinateur - Google Patents
Procédé et appareil de traitement d'images, dispositif électronique et support de stockage lisible par ordinateur Download PDFInfo
- Publication number
- WO2022083383A1 WO2022083383A1 PCT/CN2021/119567 CN2021119567W WO2022083383A1 WO 2022083383 A1 WO2022083383 A1 WO 2022083383A1 CN 2021119567 W CN2021119567 W CN 2021119567W WO 2022083383 A1 WO2022083383 A1 WO 2022083383A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- information
- video information
- target
- image
- live
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 29
- 238000012545 processing Methods 0.000 claims abstract description 70
- 238000000034 method Methods 0.000 claims abstract description 33
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 14
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 14
- 238000004590 computer program Methods 0.000 claims description 17
- 230000000007 visual effect Effects 0.000 claims description 12
- 230000000694 effects Effects 0.000 abstract description 15
- 238000010586 diagram Methods 0.000 description 12
- 235000012489 doughnuts Nutrition 0.000 description 11
- 230000006870 function Effects 0.000 description 11
- 230000009471 action Effects 0.000 description 10
- 230000008569 process Effects 0.000 description 7
- 238000004891 communication Methods 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 5
- 238000001514 detection method Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000014759 maintenance of location Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 230000002194 synthesizing effect Effects 0.000 description 2
- 230000001960 triggered effect Effects 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000011514 reflex Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 239000011800 void material Substances 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/21—Server components or server architectures
- H04N21/218—Source of audio or video content, e.g. local disk arrays
- H04N21/2187—Live feed
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/107—Static hand or arm
- G06V40/113—Recognition of static hand signs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/23424—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44016—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/472—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/472—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
- H04N21/47205—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/85—Assembly of content; Generation of multimedia applications
- H04N21/858—Linking data to content, e.g. by linking an URL to a video object, by creating a hotspot
- H04N21/8586—Linking data to content, e.g. by linking an URL to a video object, by creating a hotspot by using a URL
Definitions
- the present disclosure relates to the technical field of image processing, and in particular, to an image processing method, apparatus, electronic device, and computer-readable storage medium.
- an embodiment of the present disclosure provides an image processing device, the device includes: an information acquisition module, configured to acquire live audio and video information collected in real time; a target determination module, configured to identify the live audio and video information in the live broadcast information.
- the input information of the user, and the target object to be processed is determined in the live audio and video information according to the input information;
- the target deformation module is used to deform the target image corresponding to the target object in the live audio and video information. , obtain the deformed target image;
- the image synthesis module is used for synthesizing the deformed target image and the image in the live audio and video information to obtain synthesized audio and video information, which is used for playing the synthesized audio video information.
- embodiments of the present disclosure provide an electronic device, the electronic device comprising: one or more computer programs, wherein the one or more computer programs are stored in the memory and configured to be executed by The one or more processors execute, the one or more computer programs configured to: perform the method of the first aspect above.
- an embodiment of the present disclosure provides a computer-readable storage medium on which a computer program is stored, and when the computer program is invoked and executed by a processor, the method described in the first aspect above is implemented.
- An image processing method, device, electronic device, and computer-readable storage medium provided by the embodiments of the present disclosure, by acquiring live audio and video information collected in real time, and then identifying the input information of the live user in the live audio and video information, and according to the input information Determine the target object to be processed in the live audio and video information, and then perform deformation processing on the target image corresponding to the target object in the live audio and video information to obtain the deformed target image, and compare the deformed target image and the live audio and video information. Synthesize the images in the device to obtain synthesized audio and video information, which is used to play the synthesized audio and video information.
- the embodiment of the present disclosure can obtain input information during the live broadcast of the live broadcast user to determine the target object to be processed, and effectively highlight the target object by deforming the target object, thereby improving the display effect of the live broadcast interface and making the target object more prominent.
- Users who watch the live broadcast can pay attention to the target object in time with the live broadcast process of the live broadcast user, which improves the interest and effect of the live broadcast, which in turn helps to improve the user retention rate in the live broadcast room.
- FIG. 1 shows a schematic diagram of an implementation environment suitable for an embodiment of the present disclosure.
- FIG. 3 shows a schematic flowchart of an image processing method provided by another embodiment of the present disclosure.
- FIG. 4 shows a schematic diagram of a live broadcast interface provided by an exemplary embodiment of the present disclosure.
- FIG. 5 shows a schematic flowchart of an image processing method provided by yet another embodiment of the present disclosure.
- FIG. 6 shows a detailed flowchart of step S370 in FIG. 5 provided by an exemplary embodiment of the present disclosure.
- FIG. 7 shows a schematic flowchart of determining a target object according to live broadcast content in an image processing method provided by an exemplary embodiment of the present disclosure.
- FIG. 8 shows a block diagram of modules of an image processing apparatus provided by an embodiment of the present disclosure.
- FIG. 9 shows a structural block diagram of an electronic device provided by an embodiment of the present disclosure.
- the term “comprising” and variations thereof are open to include, i.e., “including but not limited to”.
- the term “based on” is “based at least in part on.”
- the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one additional embodiment”; the term “some embodiments” means “at least some embodiments”. Relevant definitions of other terms will be given in the description below.
- FIG. 1 shows a schematic diagram of an implementation environment applicable to an embodiment of the present disclosure, where the implementation environment includes: a first terminal 120 and a second terminal 140 . in:
- the first terminal 120 and the second terminal 140 may be a mobile phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, a moving image compression standard audio layer 3), an MP4 (Moving Picture Experts Group Audio Layer IV, a moving image compression standard audio layer) Level 4) Players, wearable devices, in-vehicle devices, Augmented Reality (AR)/Virtual Reality (VR) devices, laptops, Ultra-Mobile Personal Computers (UMPC), netbooks , a personal digital assistant (Personal digital assistant, PDA) or a special camera (such as a single-lens reflex camera, a card camera) and the like.
- the embodiment of the present disclosure does not limit the specific type of the terminal.
- first terminal 120 and the second terminal 140 may be two terminals of the same type, or may be two terminals of different types, which are not limited in this embodiment of the present disclosure.
- the first terminal 120 and the second terminal 140 respectively run a first client and a second client.
- the first client and the second client may both be live broadcast applications (Application, APP), and the first client may represent the host client used by the host user, and the first terminal 120 may represent the host user.
- the host terminal used; the second client terminal may represent the viewer client terminal used by the viewer user in the live room, and the second terminal 140 may represent the viewer terminal used by the viewer user.
- the first terminal 120 and the second terminal 140 may be directly connected through a wired network or a wireless network.
- the implementation environment may further include a server 200, then the first terminal 120 may also be connected to the second terminal 140 through the server 200, and the server 200 may be connected to the first terminal 120 and the second terminal 140 respectively through a wired network or a wireless network , so that data interaction can be performed between the server 200 and the first terminal 120 and the second terminal 140 .
- the server 200 may be a traditional server, a cloud server, a single server, a server cluster composed of several servers, or a cloud computing service center.
- FIG. 2 shows a schematic flowchart of an image processing method provided by an embodiment of the present disclosure, which can be applied to an electronic device, and the electronic device can be the above-mentioned first terminal or server. Taking the application to the first terminal (that is, the host terminal running the host client) as an example, the flow shown in FIG. 2 is described in detail below.
- the image processing method may include the following steps:
- S110 Acquire live audio and video information collected in real time.
- a live broadcast request can be triggered based on the host client running on the host terminal.
- the host client After the host client obtains the live broadcast request, it can start the image acquisition device and the audio acquisition device, and based on the image acquisition device and audio acquisition
- the device collects live audio and video information, and if the image acquisition device shoots a live broadcast user, the collected live broadcast audio and video information may include a user image of the live broadcast user.
- the display interface of the host client may display a control corresponding to the live broadcast portal, and by detecting a trigger operation acting on the control, the live broadcast request triggered by the live broadcast user may be obtained.
- the host client is a live broadcast application that can be used for live broadcast.
- the image collection device can be a device that can collect image information, such as a camera
- the audio collection device can be a device that can collect audio information, such as a microphone.
- the connected external device is not limited in this embodiment.
- the host terminal can collect live audio and video information based on the image acquisition device and the audio acquisition device, so as to obtain live audio and video information collected in real time. If the method is applied to the server, the host terminal can transmit the live audio and video information collected in real time to the server, so that the server can obtain the live audio and video information collected in real time.
- S120 Identify the input information of the live broadcast user in the live broadcast audio and video information, and determine the target object to be processed in the live broadcast audio and video information according to the input information.
- the input information may include at least one of voice information, text information, touch information, and visual information. That is, the embodiment of the present disclosure does not limit the input form of the input information, which may be input by means of voice, touch operation, air gesture, or the like.
- the target objects can be objects, or the whole or part of people, animals, plants, etc.
- the visual information refers to the input information in the image information collected by the host terminal that can be used to determine the target object, for example, the image frame in which the live broadcast user performs a preset action, and the image frame can be the corresponding image frame in the live broadcast audio and video information. It can also be an image that only contains part of the image content in each video frame image.
- the preset actions may include actions in a narrow sense, and may also include actions in a broad sense such as gestures, expressions, and gestures, which are not limited herein.
- the visual information can also be a picture that can be used to indicate or characterize the target object. For example, if the target object is a lipstick, the visual information can also be a picture of the lipstick or its picture description information.
- the specific implementation of determining the target object to be processed in the live audio and video information according to the input information may be: if it is detected based on the image information of the live audio and video information that the live user performs a If an action is set, the object indicated by the preset action is determined as the target object to be processed.
- the video frame image corresponding to the preset action may be visual information.
- the live broadcast user's host terminal can display the live broadcast interface, the live broadcast interface can display live broadcast audio and video information, and display other live broadcast content, such as object information superimposed on the live broadcast audio and video information.
- object information such as object information superimposed on the live broadcast audio and video information.
- the live broadcast user can click on the object information, so that after the anchor terminal detects the click event, the corresponding object information is obtained as the content corresponding to the input information or the recognition result. , to determine the corresponding target object.
- the live broadcast user can input various forms of input information during the live broadcast, such as at least one of voice information, text information, touch information, and visual information.
- the live broadcast user can input voice information by speaking, input by typing Text information, input visual information by performing preset actions, etc.
- the content of the identified input information can be any one or more of the item name, style, model, item picture, etc., and the target object to be processed can be determined in the live audio and video information according to the input information.
- the electronic device can detect the target object corresponding to the recognition result in the live audio and video information according to the recognition result of the input information.
- the input information of the live user in the live audio and video information is identified, and the identification result can be the item name, then the feature vector description corresponding to the item name can be obtained, and described in the live audio and video information according to the feature vector.
- the feature vector can be marked in the live audio and video information to describe the corresponding image area and the corresponding image can be used as the target image of the target object.
- the input information of the live broadcast user may include things that the live broadcast user needs to introduce or describe to other users (such as audience users in the live broadcast room), that is, the target object in the embodiment of the present disclosure
- the input information can be identified in the live broadcast audio and video information. Identify the target object for subsequent processing.
- the corresponding target object when the corresponding target object is determined in the live audio and video information according to the feature vector description, it may be an incomplete match. For example, if the matching degree reaches a specified ratio, it can be considered a match, and it is determined that the feature vector exists in the live audio and video information. Describe the corresponding object, and mark the image area where the object is located, as described above.
- the electronic device may be pre-built with a picture feature vector set, the picture feature vector set includes feature vector descriptions corresponding to various objects, and may be a complete set of a series of commodity data obtained through machine learning in the background.
- the product pictures related to an object on the network can be integrated, and the feature vector description corresponding to the object can be obtained through machine learning and feature extraction, so as to quickly lock the object in the live audio and video information.
- the feature vector description corresponding to an object may include at least one of a shape feature vector, a texture feature vector, and a color feature vector.
- the picture feature vector set can be stored locally on the host terminal or on the server, and when it is stored on the server and the execution subject of this method is the host terminal, the server can find the corresponding feature vector description according to the input information, and then store it in the server.
- the feature vector description is delivered to the host terminal, so that the host terminal can obtain the feature vector description to determine the corresponding object, that is, the target object, in the live audio and video information.
- the identification of the input information can be performed locally on the electronic device or implemented through a network. For example, it can be sent to a server based on the network, and the server identifies the input information. This embodiment does not limit the identification method.
- S130 Perform deformation processing on a target image corresponding to the target object in the live audio and video information to obtain a deformed target image.
- the deformation processing may include at least one of enlargement processing, distortion processing, stretching processing, and fisheye special effect processing.
- the specific implementation of the deformation processing is not limited in this embodiment, and can be determined according to actual needs.
- the electronic device After determining the target object to be processed, the electronic device can perform deformation processing on the target image corresponding to the target object in the live audio and video information to obtain a deformed target image, so that the display effect of the target object in the live audio and video information changes. .
- the transformation of the target object from before deformation to after deformation on the live broadcast interface brings a stronger sense of impact, so that users can more easily Pay attention to the target object
- the target object can be highlighted during the live video broadcast, which can improve the user's attention to the target object in the live broadcast room, and because the target object is determined by the input information of the live broadcast user, so by highlighting the target object can make
- the input information of the live broadcast user and the live broadcast content are more closely related, which is conducive to improving the efficiency and effect of the live broadcast.
- the deformation processing is a fisheye effect processing.
- the effect of the fisheye lens can be simulated, and the target image can be transformed into the image seen after adding the fisheye lens, which can not only increase the interest of the live broadcast, but also enrich the live broadcast effect.
- step S130 may be: performing enlargement processing on the target image corresponding to the target object in the live audio and video information to obtain the enlarged target image as the transformed target image. Therefore, when the live broadcast user mentions the target object, the image size of the target image corresponding to the target object can be enlarged, so that the target object can be displayed more friendly, so that the audience user can observe and understand the target object more clearly through the live broadcast interface, which is beneficial to Improve live performance.
- the image corresponding to the product can be enlarged through the embodiment of the present disclosure, so that the product features can be displayed more amicably, so that the audience user can By observing the product more clearly, you can better understand the product in combination with the live broadcast user's explanation, which can greatly improve the efficiency and effect of product recommendation.
- the target image corresponding to the target object may only include the target object, or may include information other than the target object, which is not limited herein.
- the electronic device can determine the image area where the target object is located in the live audio and video information, and intercept the image of the image area to obtain the intercepted image, perform front and background separation on the intercepted image, and extract the target as the foreground object.
- the image of the object is used as the target image, so that matting can be realized, so that the target image to be deformed can only contain the target object, which is beneficial to obtain a more natural effect during subsequent synthesis.
- the entire image selected in the image area where it is located is subjected to deformation processing.
- the shape of the image area can be a circle, a rectangle, a fan shape, etc., which is not limited here. It is determined by the shape of the device, which is not limited here.
- the live broadcast user can input the name of the target object "lipstick”, then the host terminal can obtain the feature vector description corresponding to the "lipstick”, and mark the item area where the "lipstick” is located in the live audio and video information based on the feature vector description, and then The image corresponding to the item area is taken as the target image corresponding to the target object, and the target image is deformed to obtain the deformed target image.
- S140 Perform synthesis processing on the deformed target image and the image in the live audio and video information to obtain synthesized audio and video information, which is used for playing the synthesized audio and video information.
- the deformed target image and the image in the currently collected live audio and video information can be synthesized to obtain synthesized audio and video information, which is used for playing the synthesized audio and video information. Then if this method is applied to the anchor terminal, when the anchor terminal obtains the synthesized audio and video information, it can play the synthesized audio and video information, and/or send the synthesized audio and video information to the terminal of the user in the live broadcast room, for example, through the server.
- the server can send the synthesized audio and video information to the terminal of the user of the live room, including at least one of the host terminal and the audience terminal, so that at least one of the host terminal and the audience terminal Play composite audio and video information.
- the image position of the target object can be determined in the currently collected live audio and video information, and the deformed target image can be superimposed on the image position, so that the deformed target image can be displayed corresponding to the image position, For example, the target object in the live audio and video information can be covered.
- the synthesis processing is also performed corresponding to any other position, which is not limited herein.
- the image processing method provided by this embodiment acquires the live audio and video information collected in real time, then identifies the input information of the live user in the live audio and video information, and determines the target object to be processed in the live audio and video information according to the input information, and then Perform deformation processing on the target image corresponding to the target object in the live audio and video information to obtain the deformed target image, and perform synthesis processing on the deformed target image and the image in the live audio and video information to obtain the synthesized audio and video information. Used to play synthetic audio and video information.
- the embodiments of the present disclosure can obtain input information during the live video broadcast of the live user to determine the target object to be processed, and effectively highlight the target object by deforming the target object, thereby improving the display effect of the live broadcast interface.
- This enables users who watch the live broadcast to pay attention to the target object in a timely manner along with the live broadcast process of the live broadcast user, which improves the interest and effect of the live broadcast, and further helps to improve the user retention rate in the live broadcast room.
- the input information may include voice information
- the electronic device may find the target object to be deformed in the live audio and video information according to the voice information input by the live broadcast user, thereby simplifying the operation of the live broadcast user without the need for the live broadcast user If you give special instructions, you can automatically lock the target object and deform it, which greatly improves the live broadcast efficiency and live broadcast effect.
- FIG. 3 shows a schematic flowchart of an image processing method provided by another embodiment of the present disclosure.
- the image processing method may include:
- S210 Acquire live audio and video information collected in real time.
- S220 Perform voice recognition on the voice information in the live audio and video information to obtain a voice recognition result.
- the speech recognition model can be run on the host terminal or the server, which is not limited here. Based on the pre-trained speech recognition model, the electronic device can perform speech recognition on the speech information in the live audio and video information, and obtain a speech recognition result.
- the object information to be processed may be information such as the name and identifier of the object that can describe the target object.
- the identifier may also include a link corresponding to the object (click on the link to view at least one of the object-related information and the purchase portal), for example, if the text indicated by the speech recognition result is "lipstick", that is, the name of the object.
- the speech recognition result can also be more specific information including style, model, etc. that can determine a unique object.
- the information corresponding to the object to be processed is more specific, according to The feature vector description that can be found in the object information to be processed is more accurate, which is more conducive to accurately determining the target object in the live audio and video information.
- S240 In the live audio and video information, determine the object corresponding to the object information to be processed as the target object.
- the feature vector description corresponding to the object information to be processed can be obtained.
- the electronic device can pre-build a mapping relationship between the object information and the feature vector description, and then according to the object information to be processed, the corresponding to determine the corresponding object in the live audio and video information according to the feature vector description, and determine the object as the target object.
- the specific methods can be found in the corresponding parts of the foregoing embodiments, which will not be repeated here.
- step S240 may include: if it is detected that the first object indicated by the object information to be processed exists in the live audio and video information, determining the first object as the target object.
- the electronic device can detect whether there is a first object indicated by the object information to be processed in the live audio and video information, and if there is, it can change the first object to determine the target object .
- the method of detecting whether it exists can be obtained by obtaining the feature vector description corresponding to the information of the object to be processed, and then matching the live audio and video information based on the feature vector description. There is a first object indicated by the object information to be processed in the audio and video information, and the first object is determined as the target object.
- S250 Perform deformation processing on a target image corresponding to the target object in the live audio and video information to obtain a deformed target image.
- S260 Perform synthesis processing on the deformed target image and the image in the live audio and video information to obtain synthesized audio and video information, which is used for playing the synthesized audio and video information.
- the deformation processing may be enlargement processing
- the specific implementation of step S250 may be: performing enlargement processing on the target image corresponding to the target object in the live audio and video information, and obtaining the enlarged target image as the deformed image. target image.
- the enlargement process can be realized by opencv.
- the image frames are composed according to the sequence of the time dimension.
- the feature vector description corresponding to the object information to be processed that is A two-dimensional vector feature that anchors a specific image area on the image, which can be a rectangle (xstart, ystart, xend, yend), or other irregular images with multiple vertices, which are not limited here.
- the code example is as follows:
- V_EXPORTS_W void resize(InputArray src,OutputArray dst,
- the size of the anchor area rectangle (xstart, ystart, xend, yend) is enlarged to n times of the original, and then synthesized, using The roi method overlays the binary data stream (item graphics) on the anchor position of the live audio and video information of the current frame (which can be the position where the original anchor area graphics is located). At this point, an enlargement process is realized.
- FIG. 4 shows a schematic diagram of a live broadcast interface provided by an exemplary embodiment of the present disclosure.
- the live broadcast user such as the anchor Zhang San
- the live broadcast interface will display
- the live audio and video information 410 obtains the information of the object to be processed by identifying the voice information of the anchor Zhang San.
- the electronic device can obtain the feature vector description corresponding to the donut, and then in the live audio and video Find the image area 411 where the "doughnut” is marked with the feature vector description in the information 410, and then after the image of the "doughnut” is enlarged, the enlarged "doughnut” image is superimposed on the image area 411. If the position corresponding to the image area 411 covers the original image area 411, at time t+1, the live broadcast audio and video information 420 of the current frame is displayed on the live broadcast interface of anchor Zhang San, and the enlarged image area 421 is displayed on it. 'Donuts' image.
- the electronic device can automatically identify the object introduced by the live broadcast user when the live broadcast user introduces the doughnut, and lock the object in the live broadcast audio and video information to zoom in, so that the live broadcast room user can listen to the live broadcast user introducing the doughnut. , while seeing the magnified donut, you can observe the donut more carefully, and get a better e-commerce live broadcast experience, so that users can fully understand the objects introduced by live broadcast users.
- the electronic device may not be able to detect the object indicated by the object information to be processed in the live audio and video information, and thus cannot determine the target object in the live audio and video information. It is further determined, thereby reducing the missed detection rate and improving the system stability.
- FIG. 5 shows a schematic flowchart of an image processing method provided by another embodiment of the present disclosure. The method may include:
- S310 Acquire live audio and video information collected in real time.
- S320 Perform voice recognition on the voice information in the live audio and video information to obtain a voice recognition result.
- S340 Determine whether it is detected that the first object indicated by the object information to be processed exists in the live audio and video information.
- the method may include:
- step S350 If it is detected that the first object indicated by the object information to be processed exists in the live audio and video information, step S350 can be executed;
- step S360 may be executed.
- S350 Determine the first object as the target object.
- the first object is determined as the target object.
- S360 Perform image recognition processing on the live audio and video information.
- the pre-stored feature vector description may be general.
- the live broadcast user said "lipstick", and the lipstick in the live audio and video information does not look like a regular lipstick), that is, it does not match the feature vector description (the matching degree is lower than the specified ratio), and the live broadcast user only said the object
- image recognition processing may be performed on the live audio and video information.
- image recognition processing can be performed on the live audio and video information to identify whether there is a preset gesture, and if so, the The second object indicated by the preset gesture is further used as the target object.
- the preset gesture can be one or more pre-stored gestures, which is not limited here, and can be set according to actual needs.
- the preset gesture can be a circle with a finger, and the object that can be circled is the The second object indicated by the preset gesture; for another example, the preset gesture can also be four fingers together and only one finger is extended, then the object pointed to by the finger can be used as the second object indicated by the preset gesture .
- the gesture of the live broadcast user can be used to further determine that the second object indicated by the preset gesture is the target object. Because the object that the user refers to and circles is usually what the user is describing, or even wants to highlight, this embodiment can more accurately determine the object in the live audio and video information, and can reduce the need to use only voice information to perform The missed detection rate that may be brought when it is determined.
- the second object may not match the input information of the live broadcast user, so it may not be the target object that the live broadcast user currently wants to highlight.
- the second object can be matched with the to-be-processed object information based on the to-be-processed object information indicated by the voice information, and only when the matching is successful, the second object is used as the target object.
- FIG. 6, shows a detailed flowchart of step S370 in FIG. 5 provided by an exemplary embodiment of the present disclosure.
- step S370 may include:
- the image area indicated by the preset gesture can be determined, and the image of the image area can be intercepted to obtain the second image corresponding to the second object, and then the second image corresponding to the second image can be searched through the network.
- the second object information if the second object information matches the to-be-processed object information, it can be determined that the second object matches the to-be-processed object information, and the second object can be used as the target object. For example, intercept the image of the image area that the live broadcast user points and circles with his hand as the second image corresponding to the second object. Determined as the target object to be deformed in a subsequent step.
- S380 Perform deformation processing on a target image corresponding to the target object in the live audio and video information to obtain a deformed target image.
- S390 Perform synthesis processing on the deformed target image and the image in the live audio and video information to obtain synthesized audio and video information, which is used for playing the synthesized audio and video information.
- the target object can be further determined according to the live content displayed on the live broadcast interface, and the target object is displayed on the live broadcast interface.
- the target object can be determined based on the target information.
- FIG. 7 shows a schematic flowchart of determining a target object according to live content in an image processing method provided by an exemplary embodiment of the present disclosure, which may specifically include:
- the target information may include at least one of an item identifier corresponding to the target object and an item image.
- the live interface may display at least one kind of target information corresponding to the object information to be processed, such as an item identifier, an item image, a purchase portal, etc., and then the corresponding target can be found in the live audio and video information according to the target information.
- the purchase portal may be displayed in the form of an item image, and the item image may have a built-in URL (Uniform Resource Locator, URL), and the user can click the item image to jump to the purchase page corresponding to the URL.
- URL Uniform Resource Locator
- a product image and product name can be displayed in the live broadcast interface. If the product name matches the information of the object to be processed, for example, the object information to be processed is the name of the object "lipstick", and the product name also includes "lipstick” , it can be considered a match, and the corresponding image area is marked in the live audio and video information based on the product image, and the object in the image area is the target object.
- steps S410-S420 may be used to replace steps S360-S370 in FIG. 5 .
- steps S410-S420 may be used to replace steps S360-S370 in FIG. 5 .
- FIG. 8 is a block diagram of an image processing apparatus provided by an embodiment of the present disclosure.
- the image processing apparatus 800 in the embodiment of the present disclosure may include: an information acquisition module 810 , a target determination module 820 , a target deformation module 830 , and an image synthesis module Module 840, where:
- An information acquisition module 810 configured to acquire live audio and video information collected in real time
- the target determination module 820 is used to identify the input information of the live user in the live audio and video information, and determine the target object to be processed in the live audio and video information according to the input information;
- the target deformation module 830 is configured to perform deformation processing on the target image corresponding to the target object in the live audio and video information to obtain the deformed target image;
- the image synthesis module 840 is used for synthesizing the deformed target image and the image in the live audio and video information to obtain the synthesized audio and video information, which is used for playing the synthesized audio and video information.
- the input information includes voice information
- the target determination module 820 may include: a voice recognition submodule, an object information determination submodule, and a target object determination submodule, wherein:
- the speech recognition sub-module is used to perform speech recognition on the speech information in the live audio and video information to obtain the speech recognition result;
- an object information determination submodule used for determining the object information to be processed based on the speech recognition result
- the target object determination submodule is used for determining the object corresponding to the object information to be processed as the target object in the live audio and video information.
- the target object determination sub-module may include: a first object determination unit, configured to determine the first object as the target if it is detected that the first object indicated by the object information to be processed exists in the live audio and video information. object.
- the target object determination submodule may include: an image recognition unit and a gesture determination unit, wherein:
- an image recognition unit configured to perform image recognition processing on the live audio and video information if it is not detected that the first object indicated by the to-be-processed object information exists in the live audio and video information;
- the gesture determination unit is configured to use the second object indicated by the preset gesture as the target object if it is recognized that there is a preset gesture in the live audio and video information.
- the gesture determination unit may include: a second object determination subunit and a target object determination subunit, wherein:
- the second object determination subunit is configured to determine the object indicated by the preset gesture as the second object if it is recognized that there is a preset gesture in the live audio and video information;
- the target object determination subunit is configured to use the second object as the target object if the second object matches the information of the object to be processed.
- the target object determination submodule may include: a live content acquisition unit and a target information determination unit, wherein:
- a live broadcast content acquisition unit configured to acquire the currently displayed live broadcast content if it is not detected that the first object indicated by the object information to be processed exists in the live broadcast audio and video information;
- the target information determining unit is configured to determine the target object according to the target information if target information corresponding to the object information to be processed exists in the live broadcast content, and the target information includes at least one of an item identifier and an image corresponding to the target object.
- the target deformation module 830 may include: an enlargement processing sub-module, configured to perform enlargement processing on the target image corresponding to the target object in the live audio and video information, and obtain the enlarged target image as the deformed target image.
- the input information includes at least one of voice information, text information, touch information, and visual information.
- the image processing apparatus in the embodiments of the present disclosure can execute an image processing method provided by the embodiments of the present disclosure, and the implementation principle is similar.
- the actions performed by each module in the image processing apparatus in the embodiments of the present disclosure are the same as Corresponding to the steps in the image processing methods in the embodiments of the present disclosure, for the detailed functional description of each module of the image processing apparatus, please refer to the descriptions in the corresponding image processing methods shown above, which will not be repeated here. .
- FIG. 9 shows a structural block diagram of an electronic device 900 suitable for implementing embodiments of the present disclosure.
- the electronic device in the embodiment of the present disclosure may include, but is not limited to, a device such as a computer.
- the electronic device shown in FIG. 9 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present disclosure.
- the electronic device 900 includes: a memory and a processor, where the processor here may be referred to as a processing device 901 hereinafter, and the memory may include a read-only memory (ROM) 902, a random access memory (RAM) 903, and a storage device 908 hereinafter At least one of the following:
- an electronic device 900 may include a processing device (eg, a central processing unit, a graphics processor, etc.) 901 that may be loaded into random access according to a program stored in a read only memory (ROM) 902 or from a storage device 908 Various appropriate actions and processes are executed by the programs in the memory (RAM) 903 . In the RAM 903, various programs and data necessary for the operation of the electronic device 900 are also stored.
- the processing device 901, the ROM 902, and the RAM 903 are connected to each other through a bus 904.
- An input/output (I/O) interface 905 is also connected to bus 904 .
- the following devices can be connected to the I/O interface 905: input devices 906 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration An output device 907 such as a computer; a storage device 908 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 909 .
- the communication means 909 may allow the electronic device 900 to communicate wirelessly or by wire with other devices to exchange data. While Figure 9 shows electronic device 900 having various means, it should be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
- embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer-readable storage medium, the computer program containing program code for performing the method illustrated in the flowchart.
- the computer program may be downloaded and installed from the network via the communication device 909, or from the storage device 908, or from the ROM 902.
- the processing apparatus 901 the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed.
- the above-mentioned computer-readable storage medium of the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
- the computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above.
- Computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
- a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
- a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with computer-readable program code embodied thereon.
- Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
- a computer-readable signal medium can also be any computer-readable storage medium, other than a computer-readable storage medium, that can send, propagate, or transport a computer-readable signal medium for use by or in connection with the instruction execution system, apparatus, or device. program.
- Program code embodied on a computer-readable storage medium may be transmitted using any suitable medium including, but not limited to, electrical wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.
- clients and servers can communicate using any currently known or future developed network protocols such as HTTP (HyperText Transfer Protocol), and can communicate with digital data in any form or medium.
- Communications eg, communications networks
- Examples of communication networks include local area networks (“LAN”), wide area networks (“WAN”), the Internet (eg, the Internet), and peer-to-peer networks (eg, ad hoc peer-to-peer networks), as well as any currently known or future development network of.
- LAN local area networks
- WAN wide area networks
- the Internet eg, the Internet
- peer-to-peer networks eg, ad hoc peer-to-peer networks
- the above-mentioned computer-readable storage medium may be included in the above-mentioned electronic device; or may exist alone without being assembled into the electronic device.
- the above-mentioned computer-readable storage medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device is made to perform the following steps: acquiring live audio and video information collected in real time; identifying the live audio and video information The input information of the live broadcast user in the video information, and the target object to be processed is determined in the live broadcast audio and video information according to the input information; the target image corresponding to the target object in the live broadcast audio and video information is deformed to obtain The deformed target image; performing synthesis processing on the deformed target image and the image in the live audio and video information to obtain synthesized audio and video information, which is used for playing the synthesized audio and video information.
- Computer program code for performing operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and This includes conventional procedural programming languages - such as the "C" language or similar programming languages.
- the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server.
- the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).
- LAN local area network
- WAN wide area network
- each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions.
- the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
- each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in dedicated hardware-based systems that perform the specified functions or operations. , or can be implemented in a combination of dedicated hardware and computer instructions.
- modules or units involved in the embodiments of the present disclosure may be implemented in software or hardware.
- the name of the module or unit does not constitute a limitation of the unit itself under certain circumstances, for example, the display module can also be described as "a module for displaying a resource uploading interface".
- exemplary types of hardware logic components include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), Systems on Chips (SOCs), Complex Programmable Logical Devices (CPLDs) and more.
- FPGAs Field Programmable Gate Arrays
- ASICs Application Specific Integrated Circuits
- ASSPs Application Specific Standard Products
- SOCs Systems on Chips
- CPLDs Complex Programmable Logical Devices
- a computer-readable storage medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
- the computer-readable storage medium may be a machine-readable signal medium or a machine-readable storage medium.
- Computer-readable storage media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing.
- machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
- RAM random access memory
- ROM read only memory
- EPROM or flash memory erasable programmable read only memory
- CD-ROM compact disk read only memory
- magnetic storage or any suitable combination of the foregoing.
- an image processing method includes: acquiring live audio and video information collected in real time; identifying input information of a live user in the live audio and video information, and The input information determines the target object to be processed in the live audio and video information; performs deformation processing on the target image corresponding to the target object in the live audio and video information to obtain a deformed target image; The image is synthesized with the image in the live audio and video information to obtain synthesized audio and video information, which is used for playing the synthesized audio and video information.
- the input information includes voice information, the identifying the input information of the live user in the live audio and video information, and determining the target object to be processed in the live audio and video information according to the input information, including: : perform voice recognition on the voice information in the live audio and video information to obtain a voice recognition result; determine the object information to be processed based on the voice recognition result; in the live audio and video information, map the object information to be processed corresponding to The object is determined as the target object.
- determining the object corresponding to the object information to be processed as the target object in the live audio and video information includes: if it is detected that the to-be-processed audio and video information exists in the live audio and video information; The first object indicated by the object information is processed, and the first object is determined as the target object.
- determining the object corresponding to the object information to be processed as the target object in the live audio and video information further comprising: if it is not detected that there is any object in the live audio and video information If the first object indicated by the object information to be processed is detected, image recognition processing is performed on the live audio and video information; if it is recognized that there is a preset gesture in the live audio and video information, the The second object serves as the target object.
- the second object indicated by the preset gesture is used as the target object, including: if the live broadcast is recognized If there is a preset gesture in the audio and video information, the object indicated by the preset gesture is determined as the second object; if the second object matches the to-be-processed object information, the second object is determined as the second object. describe the target object.
- determining the object corresponding to the object information to be processed as the target object in the live audio and video information further comprising: if it is not detected that there is any object in the live audio and video information If the first object indicated by the object information to be processed is obtained, the currently displayed live broadcast content is obtained; if there is target information corresponding to the object information to be processed in the live broadcast content, the target object is determined according to the target information, The target information includes at least one of an item identifier and an image corresponding to the target object.
- performing deformation processing on the target image corresponding to the target object in the live audio and video information to obtain the deformed target image includes: performing deformation processing on the target object in the live audio and video information.
- the corresponding target image in the image is enlarged, and the enlarged target image is obtained as the deformed target image.
- the input information includes at least one of voice information, text information, touch information, and visual information.
- an image processing apparatus includes: an information acquisition module for acquiring live audio and video information collected in real time; a target determination module for identifying the live broadcast The input information of the live broadcast user in the audio and video information, and the target object to be processed is determined in the live audio and video information according to the input information; the target deformation module is used for the corresponding target object in the live audio and video information.
- the target image is deformed to obtain a deformed target image; an image synthesis module is used to synthesize the deformed target image and the image in the live audio and video information to obtain synthesized audio and video information for use in Play the synthesized audio and video information.
- the input information includes voice information
- the target determination module may include: a voice recognition sub-module, an object information determination sub-module, and a target object determination sub-module, wherein: a voice recognition sub-module is used for the live broadcast.
- the voice information in the audio and video information is subjected to voice recognition, and a voice recognition result is obtained; an object information determination sub-module is used to determine the object information to be processed based on the voice recognition result; a target object determination sub-module is used for the live audio and video.
- the object corresponding to the to-be-processed object information is determined as the target object.
- the target object determination sub-module may include: a first object determination unit, configured to, if it is detected that the first object indicated by the to-be-processed object information exists in the live audio and video information, determine the The first object is determined as the target object.
- the target object determination sub-module may include: an image recognition unit and a gesture determination unit, wherein: the image recognition unit is configured to, if it is not detected that the live audio and video information exists as indicated by the to-be-processed object information the first object of the live audio and video information, then perform image recognition processing on the live audio and video information; the gesture determination unit is configured to recognize that there is a preset gesture in the live audio and video information, then identify the first object indicated by the preset gesture Two objects are used as the target object.
- the gesture determination unit may include: a second object determination subunit and a target object determination subunit, wherein: the second object determination subunit is used for if it is recognized that there is a preset gesture in the live audio and video information , the object indicated by the preset gesture is determined as the second object; the target object determination subunit is configured to, if the second object matches the to-be-processed object information, determine the second object as the object to be processed describe the target object.
- the target object determination sub-module may include: a live broadcast content acquisition unit and a target information determination unit, wherein: the live broadcast content acquisition unit is used for, if it is not detected that the object to be processed exists in the live broadcast audio and video information The first object indicated by the information, obtain the currently displayed live content; a target information determination unit is configured to determine the target information according to the target information if there is target information corresponding to the object information to be processed in the live broadcast content A target object, the target information includes at least one of an item identifier and an image corresponding to the target object.
- the target deformation module may include: an enlargement processing sub-module, configured to perform enlargement processing on the target image corresponding to the target object in the live audio and video information, and obtain the enlarged target image as the deformation. post target image.
- the input information includes at least one of voice information, text information, touch information, and visual information.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Human Computer Interaction (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Social Psychology (AREA)
- Psychiatry (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Business, Economics & Management (AREA)
- Marketing (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
L'invention concerne un procédé et un appareil de traitement d'images, un dispositif électronique et un support de stockage lisible par ordinateur qui se rapportent au domaine technique du traitement d'images. Ledit procédé comprend les étapes suivantes : acquérir des informations audio et vidéo de diffusion en direct collectées en temps réel ; identifier des informations d'entrée d'un utilisateur de diffusion en direct dans les informations audio et vidéo de diffusion en direct et déterminer, selon les informations d'entrée, un objet cible à traiter dans les informations audio et vidéo de diffusion en direct ; mettre en oeuvre un traitement de déformation sur une image cible correspondant à l'objet cible dans les informations audio et vidéo de diffusion en direct, afin d'obtenir une image cible déformée ; et mettre en oeuvre un traitement de synthèse sur l'image cible déformée et une image dans les informations audio et vidéo de diffusion en direct, afin d'obtenir des informations audio et vidéo synthétisées destinées à être lues. Dans les modes de réalisation de la présente invention, lorsqu'un instavidéaste mentionne un objet cible, l'objet cible peut être mis en évidence au moyen d'un traitement de déformation, ce qui améliore l'affichage et les effets de diffusion en direct.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011119916.0 | 2020-10-19 | ||
CN202011119916.0A CN112261424B (zh) | 2020-10-19 | 2020-10-19 | 图像处理方法、装置、电子设备及计算机可读存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022083383A1 true WO2022083383A1 (fr) | 2022-04-28 |
Family
ID=74243898
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/119567 WO2022083383A1 (fr) | 2020-10-19 | 2021-09-22 | Procédé et appareil de traitement d'images, dispositif électronique et support de stockage lisible par ordinateur |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112261424B (fr) |
WO (1) | WO2022083383A1 (fr) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115205637A (zh) * | 2022-09-19 | 2022-10-18 | 山东世纪矿山机电有限公司 | 一种矿车物料的智能识别方法 |
CN115460429A (zh) * | 2022-09-06 | 2022-12-09 | 河北先河环保科技股份有限公司 | 用于水质采样监测监管的方法、电子设备及存储介质 |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112261424B (zh) * | 2020-10-19 | 2022-11-18 | 北京字节跳动网络技术有限公司 | 图像处理方法、装置、电子设备及计算机可读存储介质 |
CN114915798A (zh) * | 2021-02-08 | 2022-08-16 | 阿里巴巴集团控股有限公司 | 实时视频生成方法、多摄像头直播方法及装置 |
CN115086686A (zh) * | 2021-03-11 | 2022-09-20 | 北京有竹居网络技术有限公司 | 视频处理方法及相关装置 |
CN114501041B (zh) | 2021-04-06 | 2023-07-14 | 抖音视界有限公司 | 特效显示方法、装置、设备及存储介质 |
CN112804585A (zh) * | 2021-04-13 | 2021-05-14 | 杭州次元岛科技有限公司 | 一种在直播过程中实现产品智能展示的处理方法及装置 |
CN113709545A (zh) * | 2021-04-13 | 2021-11-26 | 腾讯科技(深圳)有限公司 | 视频的处理方法、装置、计算机设备和存储介质 |
CN113269785A (zh) * | 2021-05-13 | 2021-08-17 | 北京字节跳动网络技术有限公司 | 图像处理方法、设备、存储介质及程序产品 |
CN113286160A (zh) * | 2021-05-19 | 2021-08-20 | Oppo广东移动通信有限公司 | 视频处理方法、装置、电子设备以及存储介质 |
CN113901785A (zh) * | 2021-09-29 | 2022-01-07 | 联想(北京)有限公司 | 一种标记方法及电子设备 |
CN114245193A (zh) * | 2021-12-21 | 2022-03-25 | 维沃移动通信有限公司 | 显示控制方法、装置和电子设备 |
CN114928768A (zh) * | 2022-06-10 | 2022-08-19 | 北京百度网讯科技有限公司 | 直播信息推送方法和装置、系统、电子设备、计算机介质 |
CN118264858A (zh) * | 2024-05-29 | 2024-06-28 | 深圳爱图仕创新科技股份有限公司 | 数据处理方法、装置、计算机设备及计算机可读存储介质 |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105828090A (zh) * | 2016-03-22 | 2016-08-03 | 乐视网信息技术(北京)股份有限公司 | 全景直播方法及装置 |
CN106792092A (zh) * | 2016-12-19 | 2017-05-31 | 广州虎牙信息科技有限公司 | 直播视频流分镜显示控制方法及其相应的装置 |
CN109284053A (zh) * | 2018-08-23 | 2019-01-29 | 北京达佳互联信息技术有限公司 | 评论信息显示方法和装置、移动终端及存储介质 |
CN110139161A (zh) * | 2018-02-02 | 2019-08-16 | 阿里巴巴集团控股有限公司 | 直播中的信息处理方法及装置 |
US20190311709A1 (en) * | 2015-09-02 | 2019-10-10 | Oath Inc. | Computerized system and method for formatted transcription of multimedia content |
CN110324648A (zh) * | 2019-07-17 | 2019-10-11 | 咪咕文化科技有限公司 | 直播展现方法和系统 |
CN111353839A (zh) * | 2018-12-21 | 2020-06-30 | 阿里巴巴集团控股有限公司 | 商品信息处理方法、直播商品的方法、装置及电子设备 |
CN112261424A (zh) * | 2020-10-19 | 2021-01-22 | 北京字节跳动网络技术有限公司 | 图像处理方法、装置、电子设备及计算机可读存储介质 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102055731B (zh) * | 2009-10-27 | 2015-11-25 | 中兴通讯股份有限公司 | Ivvr菜单生成系统及方法 |
CN106412681B (zh) * | 2015-07-31 | 2019-12-24 | 腾讯科技(深圳)有限公司 | 弹幕视频直播方法及装置 |
CN111464827A (zh) * | 2020-04-20 | 2020-07-28 | 玉环智寻信息技术有限公司 | 一种数据处理方法、装置、计算设备及存储介质 |
CN111757138A (zh) * | 2020-07-02 | 2020-10-09 | 广州博冠光电科技股份有限公司 | 一种基于单镜头直播视频的特写显示方法及装置 |
-
2020
- 2020-10-19 CN CN202011119916.0A patent/CN112261424B/zh active Active
-
2021
- 2021-09-22 WO PCT/CN2021/119567 patent/WO2022083383A1/fr active Application Filing
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190311709A1 (en) * | 2015-09-02 | 2019-10-10 | Oath Inc. | Computerized system and method for formatted transcription of multimedia content |
CN105828090A (zh) * | 2016-03-22 | 2016-08-03 | 乐视网信息技术(北京)股份有限公司 | 全景直播方法及装置 |
CN106792092A (zh) * | 2016-12-19 | 2017-05-31 | 广州虎牙信息科技有限公司 | 直播视频流分镜显示控制方法及其相应的装置 |
CN110139161A (zh) * | 2018-02-02 | 2019-08-16 | 阿里巴巴集团控股有限公司 | 直播中的信息处理方法及装置 |
CN109284053A (zh) * | 2018-08-23 | 2019-01-29 | 北京达佳互联信息技术有限公司 | 评论信息显示方法和装置、移动终端及存储介质 |
CN111353839A (zh) * | 2018-12-21 | 2020-06-30 | 阿里巴巴集团控股有限公司 | 商品信息处理方法、直播商品的方法、装置及电子设备 |
CN110324648A (zh) * | 2019-07-17 | 2019-10-11 | 咪咕文化科技有限公司 | 直播展现方法和系统 |
CN112261424A (zh) * | 2020-10-19 | 2021-01-22 | 北京字节跳动网络技术有限公司 | 图像处理方法、装置、电子设备及计算机可读存储介质 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115460429A (zh) * | 2022-09-06 | 2022-12-09 | 河北先河环保科技股份有限公司 | 用于水质采样监测监管的方法、电子设备及存储介质 |
CN115460429B (zh) * | 2022-09-06 | 2024-03-01 | 河北先河环保科技股份有限公司 | 用于水质采样监测监管的方法、电子设备及存储介质 |
CN115205637A (zh) * | 2022-09-19 | 2022-10-18 | 山东世纪矿山机电有限公司 | 一种矿车物料的智能识别方法 |
Also Published As
Publication number | Publication date |
---|---|
CN112261424A (zh) | 2021-01-22 |
CN112261424B (zh) | 2022-11-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022083383A1 (fr) | Procédé et appareil de traitement d'images, dispositif électronique et support de stockage lisible par ordinateur | |
WO2021082760A1 (fr) | Procédé de génération d'image virtuelle, dispositif, terminal et support de stockage associés | |
CN109168026B (zh) | 即时视频显示方法、装置、终端设备及存储介质 | |
WO2022083230A1 (fr) | Procédé d'affichage d'écran, appareil, dispositif électronique et support lisible par ordinateur | |
JP2023547917A (ja) | 画像分割方法、装置、機器および記憶媒体 | |
US12008167B2 (en) | Action recognition method and device for target object, and electronic apparatus | |
WO2023125374A1 (fr) | Procédé et appareil de traitement d'image, dispositif électronique et support de stockage | |
WO2022171024A1 (fr) | Procédé et appareil d'affichage d'images, dispositif et support | |
WO2021254502A1 (fr) | Procédé et appareil d'affichage d'objet cible, et dispositif électronique | |
CN112051961A (zh) | 虚拟交互方法、装置、电子设备及计算机可读存储介质 | |
CN110796664B (zh) | 图像处理方法、装置、电子设备及计算机可读存储介质 | |
CN112785669B (zh) | 一种虚拟形象合成方法、装置、设备及存储介质 | |
US20220358662A1 (en) | Image generation method and device | |
US12019669B2 (en) | Method, apparatus, device, readable storage medium and product for media content processing | |
WO2022233223A1 (fr) | Procédé et appareil d'assemblage d'image, dispositif et support | |
WO2023138441A1 (fr) | Procédé et appareil de génération de vidéo, dispositif et support d'enregistrement | |
US20230316529A1 (en) | Image processing method and apparatus, device and storage medium | |
WO2023165515A1 (fr) | Procédé et appareil de photographie, dispositif électronique et support de stockage | |
WO2020034981A1 (fr) | Procédé permettant de générer des informations codées et procédé permettant de reconnaître des informations codées | |
WO2023226628A1 (fr) | Procédé et appareil d'affichage d'image, dispositif électronique et support de stockage | |
WO2022170982A1 (fr) | Procédé et appareil de traitement d'image, procédé et appareil de génération d'image, dispositif et support | |
WO2024165010A1 (fr) | Procédé et appareil de génération d'informations, procédé et appareil d'affichage d'informations, dispositif et support d'enregistrement | |
WO2024131652A1 (fr) | Procédé et appareil de traitement d'effets spéciaux, dispositif électronique et support de stockage | |
WO2021227953A1 (fr) | Procédé de configuration d'effets spéciaux d'images, procédé de reconnaissance d'images, appareils et dispositif électronique | |
WO2024094158A1 (fr) | Appareil et procédé de traitement d'effets spéciaux, dispositif, et support de stockage |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21881802 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21881802 Country of ref document: EP Kind code of ref document: A1 |