WO2022037260A1 - 基于人工智能的多媒体处理方法、装置及电子设备 - Google Patents
基于人工智能的多媒体处理方法、装置及电子设备 Download PDFInfo
- Publication number
- WO2022037260A1 WO2022037260A1 PCT/CN2021/102803 CN2021102803W WO2022037260A1 WO 2022037260 A1 WO2022037260 A1 WO 2022037260A1 CN 2021102803 W CN2021102803 W CN 2021102803W WO 2022037260 A1 WO2022037260 A1 WO 2022037260A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- multimedia file
- interactive
- multimedia
- editing mode
- identified
- Prior art date
Links
- 238000013473 artificial intelligence Methods 0.000 title claims abstract description 136
- 238000003672 processing method Methods 0.000 title claims abstract description 46
- 239000000463 material Substances 0.000 claims abstract description 538
- 230000002452 interceptive effect Effects 0.000 claims abstract description 214
- 238000000034 method Methods 0.000 claims abstract description 70
- 230000004044 response Effects 0.000 claims abstract description 20
- 230000000694 effects Effects 0.000 claims description 121
- 238000012545 processing Methods 0.000 claims description 97
- 230000003993 interaction Effects 0.000 claims description 96
- 230000008569 process Effects 0.000 claims description 40
- 238000012790 confirmation Methods 0.000 claims description 27
- 230000015654 memory Effects 0.000 claims description 24
- 238000000605 extraction Methods 0.000 claims description 20
- 230000000737 periodic effect Effects 0.000 claims description 10
- 230000003190 augmentative effect Effects 0.000 claims description 8
- 238000009966 trimming Methods 0.000 claims 1
- 230000000875 corresponding effect Effects 0.000 description 88
- 241000282414 Homo sapiens Species 0.000 description 16
- 238000010586 diagram Methods 0.000 description 15
- 238000012216 screening Methods 0.000 description 14
- 230000006835 compression Effects 0.000 description 9
- 238000007906 compression Methods 0.000 description 9
- 238000004891 communication Methods 0.000 description 7
- 238000004590 computer program Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 5
- 210000000887 face Anatomy 0.000 description 4
- 238000009877 rendering Methods 0.000 description 3
- 230000001953 sensory effect Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 2
- 238000013515 script Methods 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 239000002699 waste material Substances 0.000 description 2
- 241000282326 Felis catus Species 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000003796 beauty Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000012384 transportation and delivery Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/02—Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
- G11B27/031—Electronic editing of digitised analogue information signals, e.g. audio or video signals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0484—Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/34—Indicating arrangements
Definitions
- the present application relates to artificial intelligence and multimedia technologies, and in particular, to an artificial intelligence-based multimedia processing method, apparatus, electronic device, and computer-readable storage medium.
- AI Artificial Intelligence
- CV Computer Vision
- Multimedia editing is an important application of computer vision.
- the user usually selects the material editing mode, uploads the multimedia file, and the electronic device edits the multimedia file according to the material editing mode.
- the material editing mode often has specific requirements for multimedia files, which may easily lead to failure to edit multimedia files uploaded by users.
- it is easy to perform invalid processing on multimedia files, resulting in waste of computing resources.
- the embodiment of the present application provides a multimedia processing method based on artificial intelligence, including:
- the multimedia file after applying the interactive template is presented.
- An embodiment of the present application provides an artificial intelligence-based multimedia processing device, including:
- an acquisition module configured to acquire a material editing mode and an interactive template corresponding to the type of material in the multimedia file in response to an editing operation for the multimedia file;
- an application module configured to identify the material in the multimedia file according to the material editing mode, and apply the interactive template to the identified material
- the application completion module is configured to present the multimedia file after applying the interactive template.
- the embodiment of the present application provides an electronic device, including:
- the processor is configured to implement the artificial intelligence-based multimedia processing method provided by the embodiment of the present application when executing the executable instructions stored in the memory.
- Embodiments of the present application provide a computer-readable storage medium storing executable instructions for causing a processor to execute the multimedia processing method based on artificial intelligence provided by the embodiments of the present application.
- FIG. 1 is a schematic diagram of the architecture of an artificial intelligence-based multimedia processing system provided by an embodiment of the present application
- FIG. 2 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
- 3A is a schematic flowchart of an artificial intelligence-based multimedia processing method provided by an embodiment of the present application.
- 3B is a schematic flowchart of an artificial intelligence-based multimedia processing method provided by an embodiment of the present application.
- 3C is a schematic flowchart of an artificial intelligence-based multimedia processing method provided by an embodiment of the present application.
- 3D is a schematic flowchart of an artificial intelligence-based multimedia processing method provided by an embodiment of the present application.
- 3E is a schematic flowchart of an artificial intelligence-based multimedia processing method provided by an embodiment of the present application.
- FIG. 4 is a schematic diagram of a creative gameplay template provided by an embodiment of the present application.
- FIG. 5 is a schematic diagram of selecting a multimedia file provided by an embodiment of the present application.
- FIG. 6 is a schematic diagram of a multimedia file after applying a creative gameplay template provided by an embodiment of the present application
- FIG. 7 is a schematic diagram of a multimedia file after applying a creative gameplay template provided by an embodiment of the present application.
- FIG. 8 is a schematic diagram of a confirmation prompt provided by an embodiment of the present application.
- FIG. 9 is a schematic diagram of a material editing capability provided by an embodiment of the present application.
- FIG. 10 is a schematic flowchart of an artificial intelligence-based multimedia processing method provided by an embodiment of the present application.
- first ⁇ second ⁇ third is only used to distinguish similar objects, and does not represent a specific ordering of objects. It is understood that “first ⁇ second ⁇ third” Where permitted, the specific order or sequence may be interchanged to enable the embodiments of the application described herein to be practiced in sequences other than those illustrated or described herein. In the following description, reference to the term “plurality” refers to at least two.
- Multimedia file refers to a file containing at least one form of media, for example, a multimedia file can be any one of pictures, audio and video.
- Material refers to the content (or object) in the multimedia file, for example, the type of material can be human face, cat, dog or sky, etc.
- Material editing mode used to identify materials in multimedia files, and the material editing mode is used to provide material editing capabilities, that is, the ability to identify materials, such as face recognition capabilities or sky recognition capabilities.
- Interactive template including interactive effects, which are applied to multimedia files to form material-based interactive effects.
- the embodiment of the present application does not limit the specific representation form of the interactive effect, for example, it may be a video, a picture, an audio, an animation, a special effect, or a variable speed effect.
- Artificial intelligence model a model constructed based on the principle of artificial intelligence, the embodiment of the present application does not limit the type of the artificial intelligence model, for example, the artificial intelligence model may be a neural network model.
- Virtual Reality Using the data in real life, the electronic signals generated by computer technology are combined with various output devices to transform it into a simulated environment that can be felt by people.
- the effects created using virtual reality technology are virtual reality effects.
- Augmented Reality After simulating and simulating virtual information such as computer-generated text, images, 3D models, music or videos, and applying it to the real world, the virtual information and the information in the real world complement each other. This enables enhancements to the real world. Effects created with augmented reality technology are augmented reality effects.
- Database A data collection that is stored together in a certain way, can be shared with multiple users, has as little redundancy as possible, and is independent of the application program. Query, update and delete operations.
- Embodiments of the present application provide an artificial intelligence-based multimedia processing method, device, electronic device, and computer-readable storage medium, which can effectively edit multimedia files, improve the success rate of editing, and improve the utilization of computing resources consumed by electronic devices. Rate.
- Exemplary applications of the electronic device provided by the embodiment of the present application are described below.
- the electronic device provided by the embodiment of the present application may be implemented as a terminal device or as a server.
- the electronic device can effectively edit multimedia files, improve the utilization rate of computing resources, that is, improve the editing performance of the electronic device itself, and is suitable for various editing scenarios.
- FIG. 1 is a schematic diagram of the architecture of an artificial intelligence-based multimedia processing system 100 provided by an embodiment of the present application.
- a terminal device 400 is connected to a server 200 through a network 300, and the server 200 is connected to a database 500, wherein the network 300 may be a wide area network or a local area network. , or a combination of the two.
- the artificial intelligence-based multimedia processing method may be implemented by the terminal device.
- the terminal device 400 when receiving an editing operation for a certain multimedia file, the terminal device 400 obtains a material editing mode and an interactive template corresponding to the type of the material in the multimedia file, identifies the material in the multimedia file according to the material editing mode, and analyzes all the materials in the multimedia file. The identified material applies the interactive template. Finally, the terminal device 400 presents the multimedia file after applying the interactive template in the graphical interface 410 .
- various results involved in the multimedia processing process may be pre-stored locally in the terminal device 400, or may be sent to the outside world by the terminal device 400 (such as the server 200, database 500, The latter method can reduce the storage resource occupation of the terminal device 400 .
- the artificial intelligence-based multimedia processing method provided by the embodiments of the present application may also be implemented collaboratively by the server and the terminal device.
- the server 200 when receiving the editing operation for the multimedia file sent by the terminal device 400 , the server 200 acquires the material editing mode and the interactive template corresponding to the type of the material in the multimedia file from the database 500 . Then, the server 200 can identify the material in the multimedia file according to the material editing mode, apply the interactive template to the identified material, and send the multimedia file to which the interactive template is applied to the terminal device 400, so that the terminal device 400 can edit the multimedia file to which the interactive template is applied.
- the files are presented in graphical interface 410 .
- the server 200 may also send the material editing mode and the interactive template to the terminal device 400, so that the terminal device 400 locally identifies the material in the multimedia file according to the material editing mode, and applies the interactive template to the identified material, wherein the The material editing modes and interactive templates corresponding to multiple types are stored locally in the terminal device 400.
- the server 200 can only send the identification (or number) of the material editing mode and the identification (or number) of the interactive template, and the terminal device 400 can The corresponding material editing mode and interactive template can be called locally for editing, which can reduce the consumption of communication resources.
- various results (such as multimedia files, material editing modes, interactive templates, etc.) involved in the multimedia processing process may be pre-stored in the distributed file system or database 500 of the server 200, or the server may 200 can be obtained from the outside world (such as a blockchain, etc.), and the latter method can reduce the storage resource occupation of the server 200 .
- the terminal device 400 is used to display various results involved in the multimedia processing process in the graphical interface 410 .
- a multimedia file 411 to be edited, editing options for the multimedia file 411 , and a multimedia file 412 after applying an interactive template are exemplarily shown.
- the editing option for the multimedia file 411 is triggered, it is determined that an editing operation for the multimedia file 411 is received; the applied interactive template includes a face for replacement, that is, the multimedia file 412 is obtained by editing the face of the multimedia file 411. replaced.
- the terminal device 400 or the server 200 may implement the artificial intelligence-based multimedia processing method provided by the embodiments of the present application by running a computer program.
- the computer program may be a native program or software module in an operating system; It is a native application (APP, Application), that is, a program that needs to be installed in the operating system to run; it can also be a small program, that is, a program that can be run only by downloading it into the browser environment; it can also be a program that can run An applet embedded in any APP.
- APP Native application
- the above-mentioned computer programs may be any form of application, module or plug-in.
- the server 200 may be an independent physical server, or a server cluster or a distributed system composed of multiple physical servers, or may provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, Cloud servers for basic cloud computing services such as network services, cloud communications, middleware services, domain name services, security services, Content Delivery Network (CDN), and big data and artificial intelligence platforms, where cloud services can be multimedia
- the processing service is invoked by the terminal device 400 to edit the multimedia file sent by the terminal device 400 , and finally send the multimedia file after applying the interactive template to the terminal device 400 .
- the terminal device 400 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc., but is not limited thereto.
- the terminal device and the server may be directly or indirectly connected through wired or wireless communication, which is not limited in this embodiment of the present application.
- the database 500 and the server 200 may be provided independently. In some embodiments, the database 500 and the server 200 can also be integrated together, that is, the database 500 can be regarded as existing inside the server 200 and integrated with the server 200 , and the server 200 can provide the data management function of the database 500 .
- FIG. 2 is a schematic structural diagram of a terminal device 400 provided by an embodiment of the present application.
- the terminal device 400 shown in FIG. The various components in terminal 400 are coupled together by bus system 440 .
- bus system 440 is used to implement the connection communication between these components.
- the bus system 440 also includes a power bus, a control bus, and a status signal bus.
- the various buses are labeled as bus system 440 in FIG. 2 .
- the processor 410 may be an integrated circuit chip with signal processing capabilities, such as a general-purpose processor, a digital signal processor (DSP, Digital Signal Processor), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., where a general-purpose processor may be a microprocessor or any conventional processor or the like.
- DSP Digital Signal Processor
- User interface 430 includes one or more output devices 431 that enable presentation of media content, including one or more speakers and/or one or more visual display screens.
- User interface 430 also includes one or more input devices 432, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, and other input buttons and controls.
- Memory 450 may be removable, non-removable, or a combination thereof.
- Exemplary hardware devices include solid state memory, hard drives, optical drives, and the like.
- Memory 450 optionally includes one or more storage devices that are physically remote from processor 410 .
- Memory 450 includes volatile memory or non-volatile memory, and may also include both volatile and non-volatile memory.
- the non-volatile memory may be a read-only memory (ROM, Read Only Memory), and the volatile memory may be a random access memory (RAM, Random Access Memory).
- ROM read-only memory
- RAM random access memory
- the memory 450 described in the embodiments of the present application is intended to include any suitable type of memory.
- memory 450 is capable of storing data to support various operations, examples of which include programs, modules, and data structures, or subsets or supersets thereof, as exemplified below.
- the operating system 451 includes system programs for processing various basic system services and performing hardware-related tasks, such as framework layer, core library layer, driver layer, etc., for implementing various basic services and processing hardware-based tasks;
- a presentation module 453 for enabling presentation of information (eg, a user interface for operating peripherals and displaying content and information) via one or more output devices 431 (eg, a display screen, speakers, etc.) associated with the user interface 430 );
- An input processing module 454 for detecting one or more user inputs or interactions from one of the one or more input devices 432 and translating the detected inputs or interactions.
- the artificial intelligence-based multimedia processing apparatus provided by the embodiments of the present application may be implemented in software.
- FIG. 2 shows the artificial intelligence-based multimedia processing apparatus 455 stored in the memory 450, which may be a program and Software in the form of plug-ins, including the following software modules: acquisition module 4551, application module 4552 and application completion module 4553, these modules are logical, so any combination or further division can be carried out according to the realized functions. The function of each module will be explained below.
- the artificial intelligence-based multimedia processing apparatus provided by the embodiments of the present application may be implemented in hardware.
- the artificial intelligence-based multimedia processing apparatus provided by the embodiments of the present application may be implemented using a hardware decoding processor
- a processor in the form of a processor which is programmed to execute the artificial intelligence-based multimedia processing method provided by the embodiments of the present application, for example, a processor in the form of a hardware decoding processor may adopt one or more application-specific integrated circuits (ASIC, Application Specific Integrated Circuits).
- ASIC Application Specific Integrated Circuits
- DSP Programmable Logic Device
- PLD Programmable Logic Device
- CPLD Complex Programmable Logic Device
- FPGA Field-Programmable Gate Array
- the artificial intelligence-based multimedia processing method provided by the embodiment of the present application will be described with reference to the exemplary application and implementation of the electronic device provided by the embodiment of the present application.
- FIG. 3A is a schematic flowchart of an artificial intelligence-based multimedia processing method provided by an embodiment of the present application, which will be described with reference to the steps shown in FIG. 3A .
- step 101 in response to an editing operation for the multimedia file, a material editing mode and an interactive template corresponding to the type of material in the multimedia file are acquired.
- the multimedia files can be pre-stored locally in the electronic device, or acquired by the electronic device from the outside world (such as the Internet), or collected in real time by the electronic device, for example, collected in real time by the camera and/or microphone of the electronic device arrived.
- the electronic device detects (or receives) an editing operation for the multimedia file, it acquires a material editing mode and an interactive template corresponding to the type of material in the multimedia file.
- the embodiment of the present application does not limit the type of editing operation, for example, it may be a contact operation (such as a click operation or a long press operation, etc.), or a non-contact operation (such as a gesture operation or a voice input operation, etc.).
- the electronic device may present an editing option for a multimedia file on a graphical interface, and use a trigger operation for the editing option (such as a click operation or a long-press operation, etc.) as an editing operation for the multimedia file.
- the electronic device in response to an editing operation for the multimedia file, acquires a material editing mode and an interactive template corresponding to the type of the material in the multimedia file, wherein the type of the material in the multimedia file may be obtained directly, such as input by a user, It can also be obtained by performing material identification processing on multimedia files.
- the preset material editing mode corresponding to the face type includes face recognition capability (for example, the face recognition capability is realized through a face recognition model), and the corresponding interactive template includes face special effects, such as Special effects to decorate the facial features;
- the material editing mode corresponding to the sky type includes the ability to recognize the sky, and the corresponding interactive template includes the picture used to replace the sky.
- the material editing mode and the interactive template can be componentized to reduce the degree of coupling between the two, and the degree of freedom can also be improved for the creation process of the material editing mode and the interactive template.
- the multimedia file may include multiple types of materials, for example, a picture includes both a human face and a sky. Therefore, in step 101, multiple material editing modes and multiple interactive templates may be acquired.
- the above-mentioned acquisition of the material editing mode and the interactive template corresponding to the type of material in the multimedia file can be implemented by: in the candidate material editing modes corresponding to the The material editing mode corresponding to the type of material in the file is obtained, and at least one interactive template corresponding to the material editing mode is obtained.
- corresponding material editing modes may be set respectively, and at least one editing mode corresponding to each candidate material editing mode may be set Interactive template.
- setting a face type corresponds to a candidate material editing mode including face recognition capabilities
- the candidate material editing mode corresponds to an interactive template A and an interactive template B
- the interactive templates A and B include different styles of face special effects
- set the sky corresponds to the candidate material editing mode including the sky recognition ability
- the candidate material editing mode corresponds to the interactive template C and the interactive template D
- the interactive template C includes a starry sky picture for replacing the sky
- the interactive template D includes a waterfall for replacing the sky. picture.
- a material editing mode corresponding to the type is obtained from among the multiple candidate material editing modes, and at least one interactive template corresponding to the material editing mode is further obtained.
- step 102 a material is identified in the multimedia file according to the material editing mode, and an interactive template is applied to the identified material.
- the material is identified in the multimedia file, and the acquired interactive template is applied to the identified material.
- the acquired interactive template is applied to the identified material.
- add special effects in the interactive template for example, replace the identified material with a picture in the interactive template.
- the application of the interactive template is not limited to this, and can be implemented according to actual application scenarios. set up.
- the method further includes: presenting a confirmation prompt corresponding to the identified material; wherein the confirmation prompt includes the type of the identified material and the location in the multimedia file information, and at least one of the preview results obtained after applying the interactive template; the above-mentioned application of the interactive template to the identified material can be implemented by: in response to the confirmation operation for the confirmation prompt, the identified material is applied Interactive template.
- a confirmation prompt corresponding to the identified material may be presented in the graphical interface of the electronic device.
- the confirmation prompt may include at least one of the type of the identified material, the location information of the material in the multimedia file, and the preview result obtained after applying the interactive template. According to different actual application scenarios, confirm the Prompts can include more or less content.
- the location information of the material in the multimedia file can be time location information, or it can be the area location information of the material in a video frame or a picture.
- the location information may be the timestamp of the video frame including the human face in the video, or may be the specific area occupied by the human face in the video frame including the human face.
- the preview result obtained after applying the interactive template may be obtained by applying the interactive template to at least some of the materials identified in the multimedia file, for example, any video frame (such as the first video frame) that includes a human face in the video Apply the interactive template and get a preview of the result.
- the electronic device When receiving a confirmation operation for the confirmation prompt, the electronic device applies an interactive template to the identified material; when receiving a denial operation for the confirmation prompt, it suspends editing of the multimedia file. In this way, whether to apply the interactive template is determined by means of human-computer interaction, which can reduce the waste of computing resources.
- the role of the confirmation prompt is not limited to this.
- a confirmation prompt corresponding to the identified material may be presented, and the confirmation prompt includes the type of the identified material (ie Illegal content) and location information in the multimedia file, so as to prompt the existence of illegal content in the multimedia file, wherein the illegal content is such as a watermark.
- the user can manually modify the content in the multimedia file according to the confirmation prompt, or the electronic device can also apply an interactive template to the identified material when receiving a confirmation operation for the confirmation prompt.
- the interactive template is used to block illegal content. Examples include mosaics used to cover offending content.
- the interaction template includes an interaction effect and a duration of the interaction effect; the above-mentioned application of the interaction template to the identified material may be implemented by: applying the interaction effect in the interaction template to the identified material, and Keep the app interactive until the duration is reached.
- the multimedia file is a video with a total length of 10 seconds
- the type of material in the video is a face
- the time and position information of the face in the video includes the 3rd, 3.5th, 4th, 7th, 7.5th and
- the acquired interactive template includes the interactive effect of the face special effect and the duration of 1 second
- the interactive effect of the application can be reused, and the processing resource consumption caused by the frequent application of the interactive effect can be saved.
- the multimedia file is a picture
- the type of the material in the picture is a face
- the obtained interactive template includes the interactive effect of the face special effect and the duration of 5 seconds
- Face effects in the template and keep the added face effects unchanged until the duration reaches 5 seconds to generate a video
- each video frame in the video is a picture with face effects added. In this way, another reuse of interactive effects is achieved.
- the method further includes: cutting the multimedia file according to the position information of the identified material in the multimedia file.
- the location information here may be time location information, or may be regional location information of the material in a video frame or a picture.
- the multimedia file can be cropped according to the first timestamp (that is, the earliest timestamp) and the last timestamp (that is, the latest timestamp) in the time and location information of the material.
- the multimedia file is a video with a total length of 30 seconds, and the time and position information of the identified material in the multimedia file includes the 5th, 7th, and 10th seconds. The section from seconds 5 to 10 in the multimedia file.
- the location information is regional location information
- the multimedia file can be cropped for the region according to the regional location information.
- the multimedia file is a picture, and the regional location information of the identified material in the picture is the left half, then Crop out the left half of the image.
- an interactive template may be applied to the material identified in the cropped multimedia file.
- step 103 the multimedia file after applying the interactive template is presented.
- the multimedia file after the application of the interactive template can be presented in the graphical interface, for example, the picture after the application of the interactive template is presented, or the video or audio after the application of the interactive template is played.
- a manual editing option can also be presented, so that users can manually edit the multimedia file after applying the interactive template, such as manually cropping the multimedia file, or manually adding effects, such as text, stickers, special effects or music, etc. .
- the embodiment of the present application can reduce the selection threshold of multimedia files, improve the success rate of multimedia editing, and improve the processing efficiency of electronic devices by acquiring the material editing mode and the interactive template corresponding to the type of material in the multimedia file.
- FIG. 3B is a schematic flowchart of an artificial intelligence-based multimedia processing method provided by an embodiment of the present application.
- Step 102 shown in FIG. 3A can be implemented through steps 201 to 202, and each step will be combined with each other. Be explained.
- step 201 the material is identified in the multimedia file according to the material editing mode.
- step 202 when the identified material satisfies the material interaction condition corresponding to the material editing mode, the first effect in the interaction template is applied to the identified material.
- the interactive template is used to form a material-based interactive effect
- the interactive effect includes a first effect and a second effect different from the first effect
- the first effect includes at least one of an augmented reality effect and a virtual reality effect , or, the effect that is directly applied to the material can be used as the first effect, and the effect other than the first effect can be used as the second effect, regardless of its expression form.
- the number of the first effect and the second effect included in the interactive template is not limited.
- the identified material is compared with the material interaction condition corresponding to the material editing mode, wherein the material interaction condition corresponding to the material editing mode can be preset, for example, If the multimedia file is a picture, and the material interaction condition includes the area ratio threshold of a certain type of material, when the area ratio of the type of material in the picture is greater than the area percentage threshold, it is determined that the material interaction condition is satisfied.
- a first effect is applied to the identified material.
- the first effect is an animation special effect of a human face
- the identified material that is, a human face
- the animation special effect to obtain an animation human face, so as to form a sensory effect of virtual reality
- an effect is a special effect of a stall
- the special effect of the stall can be superimposed in the area of the identified material (eg, the ground), so as to form an augmented reality sensory effect of setting up a stall in the real world.
- step 203 when the identified material does not meet the material interaction conditions, obtain the setting position corresponding to the second effect in the interactive template, and set it in the multimedia file Position to apply a second effect.
- the identified material does not meet the material interaction conditions, for example, when the area proportion of the material is less than or equal to the area proportion threshold, the identified material is not suitable for direct application of the effect, so the set position corresponding to the second effect is obtained and displayed in the multimedia
- the set position in the file applies the second effect in the interactive template.
- the setting position is not limited in the embodiment of the present application, and may be specifically set according to the actual application scenario.
- the setting position corresponding to the second effect can be included in the interactive template, and of course can also be stored in other places.
- the multimedia file is a picture
- the type of the material in the picture is a face
- the second effect included in the acquired interactive template is the text "Face is about to appear”
- the set position included is the center of the picture
- the multimedia file is a video
- the first effect is applied to the material identified in the video frame; when the material identified in a video frame does not meet the material interaction conditions, the video The second effect is applied at the set position in the frame.
- the above-mentioned identification of the material in the multimedia file according to the material editing mode may be implemented in the following manner: when the editing scene is a non-real-time scene, the material is identified in the multimedia file according to the material editing mode, and the material and multimedia file are obtained.
- the matching degree of the file and when the matching degree is greater than the matching degree threshold in the first material interaction condition corresponding to the material editing mode, it is determined that the identified material satisfies the first material interaction condition; when the editing scene is a real-time scene, the multimedia file Perform compression processing, identify the material in the compressed multimedia file according to the material editing mode, obtain the matching degree between the material and the compressed multimedia file, and when the matching degree is greater than the matching degree threshold in the second material interaction condition corresponding to the material editing mode When the identified material satisfies the second material interaction condition, the matching degree threshold in the first material interaction condition is greater than the matching degree threshold in the second material interaction condition.
- editing scenarios can be divided into non-real-time scenarios with lower real-time requirements and real-time scenarios with higher real-time requirements.
- a scenario in which a multimedia file is a picture is determined as a non-real-time scenario
- the scene in which the multimedia file is a video is determined as a real-time scene; for another example, the scene in which the multimedia file is pre-stored locally in the electronic device is determined as a non-real-time scene, and the scene in which the multimedia file is collected in real time or acquired from the outside world in real time is determined as a real-time scene.
- the first material interaction condition applicable to the non-real-time scene and the second material interaction condition applicable to the real-time scene can be preset in the material editing mode.
- the material is identified in the multimedia file to be edited (ie, the original multimedia file) according to the obtained material editing mode, and the matching degree between the material and the multimedia file is obtained. Then, the obtained matching degree is compared with the matching degree threshold in the first material interaction condition corresponding to the material editing mode, and when the matching degree is greater than the matching degree threshold, it is determined that the identified material satisfies the first material interaction condition.
- the matching degree between the material and the multimedia file may be the area proportion of the material in the multimedia file
- the matching degree threshold may be the area proportion threshold.
- the multimedia file is compressed, and the material is identified in the compressed multimedia file according to the material editing mode, and the match between the material and the compressed multimedia file is obtained. degree.
- the multimedia file is compressed by reducing the size of the multimedia file to a set size.
- the set size can be set according to the processing time and recognition accuracy of the material editing mode.
- the settings corresponding to different material editing modes Sizes can vary.
- the obtained matching degree is greater than the matching degree threshold in the second material interaction condition corresponding to the material editing mode, it is determined that the identified material satisfies the second material interaction condition, wherein the matching degree threshold in the first material interaction condition is greater than the second material interaction condition
- the match level threshold in the creative interaction condition By means of compression processing, the efficiency of identifying materials according to the material editing mode can be improved, and the real-time requirements can be met; at the same time, considering that after the compression processing, the accuracy of identifying materials may become lower, so the second material is set
- the matching degree threshold in the interaction condition is smaller than the matching degree threshold in the interaction condition of the first material, so as to conform to the characteristics of compression processing.
- the embodiment of the present application applies different effects according to whether the identified material meets the corresponding material interaction conditions, which improves the applicability to different identification situations.
- the multimedia file is a video
- the continuity and integrity of the application effect can be guaranteed.
- FIG. 3C is a schematic flowchart of an artificial intelligence-based multimedia processing method provided by an embodiment of the present application.
- a candidate material is identified in the multimedia file according to the material editing mode, and when the candidate material satisfies the material interaction condition corresponding to the material editing mode, the material editing mode is used as the material editing mode to be applied.
- the multiple material editing modes can be filtered. For example, for each material editing mode, it is identified in the multimedia file according to the material editing mode, and for the convenience of distinction, the material identified here is named as a candidate material.
- the candidate material obtained according to a certain material editing mode satisfies the material interaction condition corresponding to the material editing mode, the material editing mode is used as the material editing mode to be applied. In this way, the intelligent selection of the material editing mode can be realized, so that the material editing mode to be applied can be consistent with the material in the multimedia file.
- step 302 a preview identification process of multiple material editing modes may be presented, and in response to the selection operation for the material editing modes, the selected material editing mode is used as the to-be-applied material editing mode. Material editing mode.
- the embodiment of the present application also provides another way of screening multiple material editing modes, that is, the preview identification process of multiple material editing modes is presented in a graphical interface, and in response to the selection operation for the material editing mode, the The selected (selected) material editing mode is the material editing mode to be applied.
- the preview recognition process of recognizing the sample multimedia file according to the material editing mode can be presented, and the preview recognition process of recognizing the multimedia file to be edited according to the material editing mode can also be presented.
- the sample multimedia file or the multimedia file is a video
- a preview recognition process for recognizing one or several video frames can be presented.
- the preview recognition process may include the results before and after the recognition. Taking the case where the material editing mode includes face recognition capability, and the sample multimedia file is a sample picture, the shown preview recognition process may include the original sample picture, as well as the recognition process. A sample picture of the obtained face position (eg, the face position is highlighted in the form of dashed lines).
- other information related to the material editing mode may also be displayed, for example, the name of the material editing mode, such as the face recognition mode and the sky recognition mode, may be presented. In this way, the material editing mode is screened by means of human-computer interaction, so that the material editing mode to be applied can meet the actual needs of the user. According to different actual application scenarios, any one of steps 301 and 302 may be applied to filter the material editing mode.
- any one of the multiple material editing modes may also be used as the material editing mode to be applied.
- the material editing mode to be applied is determined, other material editing modes may also be presented in the graphical interface for the user to switch.
- step 102 shown in FIG. 3A can be updated to step 303 , in which material is identified in the multimedia file according to the material editing mode to be applied, and an interactive template is applied to the identified material.
- the above-mentioned identification of candidate materials in a multimedia file according to the material editing mode can be achieved by: when the multimedia file is a video, periodic frame extraction processing is performed on the multimedia file to obtain candidate video frames. According to the material editing mode, the candidate material is identified in the candidate video frame;
- the above-mentioned identification of the material in the multimedia file according to the material editing mode to be applied can be realized in this way: according to the material editing mode to be applied, in each of the multimedia files Identify footage in video frames.
- the multimedia file is a video
- periodic frame extraction processing may be performed on the multimedia file, for example, the frame extraction frequency is once every 2 seconds, and finally multiple candidate video frames are obtained.
- candidate materials are identified in the candidate video frames according to the material editing mode, so that the processing pressure can be reduced and the real-time performance can be improved when the material editing mode is selected.
- the storage space occupied by the obtained multiple candidate video frames is greater than the storage space threshold, the multiple candidate video frames can be compressed, and then the compressed candidate video frames can be compressed according to the material editing mode. identification to further improve real-time performance.
- the material editing mode can be set as the material editing mode to be applied; It is set that when the candidate materials in the set number or set ratio of candidate video frames meet the material interaction conditions corresponding to the material editing mode, the material editing mode is used as the material editing mode to be applied.
- each video frame of the multimedia file can also be compressed here, and then the material can be identified in each compressed video frame to meet real-time requirements.
- the processing pressure when screening the material editing mode can be reduced, so that the material editing mode to be applied can be quickly determined.
- the embodiment of the present application provides two screening methods of intelligent selection and human-computer interaction selection, which improves the flexibility and accuracy of screening .
- FIG. 3D is a schematic flowchart of an artificial intelligence-based multimedia processing method provided by an embodiment of the present application. Step 102 shown in FIG. 3A may be implemented through steps 401 to 404 .
- step 401 the material is identified in the multimedia file according to the material editing mode.
- step 402 when the identified material satisfies the material interaction condition corresponding to any interactive template, the interactive template corresponding to the satisfied material interaction condition is used as the interactive template to be applied.
- the multiple interaction templates may be filtered.
- the material interaction conditions corresponding to each interactive template are obtained, and when the material identified according to the material editing mode satisfies the material interaction conditions corresponding to any interactive template, the interactive template corresponding to the satisfied material interaction conditions is used as the interactive template to be applied .
- the interactive template corresponding to a certain type of material includes a first interactive template and a second interactive template
- the first interactive template includes both the first effect and the second effect
- the second interactive template includes only the second effect
- the first interaction template and the second interaction template correspond to different material interaction conditions. If the material identified in the multimedia file satisfies the material interaction condition corresponding to the first interaction template, the first interaction template is used as the interaction template to be applied. On this basis, if the number of the first interactive templates is multiple, and the multiple first interactive templates correspond to the same material interaction conditions, then multiple interactive templates to be applied can be obtained. In this case, multiple first interactive templates can be applied. Any one of the interactive templates, and the other first interactive templates are presented in the graphical interface for the user to switch. This method is also applicable to the case where the number of other interactive templates to be applied is multiple.
- the material used to compare the interactive conditions of the material in step 402 may be the material identified in the candidate video frames of the multimedia file (obtained by periodic frame extraction), In this way, the efficiency of screening interactive templates can be improved.
- step 403 a preview application process of a plurality of interactive templates is presented, and in response to a selection operation for the interactive template, the selected interactive template is used as the interactive template to be applied.
- a preview application process of multiple interactive templates may also be presented, and in response to the selection operation for the interactive template, the selected interactive template is used as the interactive template to be applied .
- the preview application process of applying the interactive template in the sample multimedia file can be presented, and the preview application process of applying the interactive template in the multimedia file to be edited can also be presented.
- the preview application process may include the results before and after the application.
- the shown preview application process may include the original sample picture, and A sample image that replaces the background with a starry sky image.
- other information related to the interactive template may also be displayed, such as the name of the presented interactive template, such as a starry sky background template and a face special effect template.
- any one of steps 402 and 403 may be applied to filter the interactive templates.
- any one of the multiple interactive templates (for example, the first interactive template) can also be directly used as the interactive template to be applied. After the interaction template to be applied is determined, other interaction templates may be presented in the graphical interface for the user to switch.
- step 404 the interactive template to be applied is applied to the identified material.
- the embodiment of the present application also provides two screening methods of intelligent selection and human-computer interaction selection, which improves the flexibility and accuracy of screening .
- FIG. 3E is a schematic flowchart of an artificial intelligence-based multimedia processing method provided by an embodiment of the present application.
- Step 101 shown in FIG. 3A can be implemented through steps 501 to 503, and each step will be combined with each other. Be explained.
- step 501 in response to the editing operation for the multimedia file, when the multimedia file is a video, periodic frame extraction processing is performed on the multimedia file to obtain candidate video frames.
- the multimedia file to be edited is a video
- frame extraction processing is performed on the multimedia file according to the set frame extraction frequency to obtain multiple candidate video frames.
- step 502 material identification processing is performed on the candidate video frame to obtain the type of material in the multimedia file.
- material identification processing is performed on the candidate video frames, that is, multi-classification processing is performed to obtain the type of material in the multimedia file.
- the method before step 502, further includes: when the storage space occupied by the multiple candidate video frames is greater than the storage space threshold, compressing the multiple candidate video frames; Perform material identification processing on the video frame to obtain the type of material in the multimedia file: Perform material identification processing on multiple candidate video frames after compression to obtain the type of material in each candidate video frame, and combine the multiple candidate video frames.
- the type of the material with the largest proportion in the multimedia file is used as the type of the material in the multimedia file for obtaining the corresponding material editing mode; the proportion includes either the area proportion or the quantity proportion of the material.
- the multiple candidate video frames can be compressed.
- the set size can be specifically set according to the actual application scenario.
- a material identification process is performed on the compressed candidate video frames respectively to obtain the type of material in each candidate video frame. If the multiple candidate video frames include different types, one type can be screened out and used as the Get the type of the corresponding material editing mode.
- the embodiment of the present application provides two methods for screening types.
- the first method is to screen out the type of material with the largest area ratio among multiple candidate video frames.
- the candidate video frames obtained by frame extraction include video frame 1, In video frame 2 and video frame 3, the area of the face and the sky in video frame 1 is 80% and 20% respectively, and the area of the face and sky in video frame 2 is 70% and 30% respectively.
- the areas of the face and the sky account for 75% and 25% respectively, then it can be calculated that the average area of the face is 75%, and the average area of the sky is 25%. Large faces, as filtered types.
- This method is also applicable to the case where the multimedia file is a picture.
- the second method is to filter out the type of material with the largest number of candidate video frames.
- the candidate video frames obtained by frame extraction include video frame 1, video frame 2, and video frame 3.
- Video frame 1 and video frame Frame 2 only includes human faces, and video frame 3 includes both faces and sky, then it can be obtained that the proportion of human faces in multiple candidate video frames is 100%, and the proportion of sky is 1/3. Therefore, the face with a larger proportion is used as the type to be screened out.
- the types of materials are effectively filtered through the above methods.
- type screening may not be performed, and the corresponding material editing mode and interactive template may be obtained directly according to the type of material in each candidate video frame. If the number of acquired material editing modes or interactive templates is more than one, it will be filtered again.
- the method before step 502, further includes: acquiring a plurality of sample multimedia files; performing material identification processing on the sample multimedia files through an artificial intelligence model to obtain the types of materials in the sample multimedia files; The difference between the actual type and the weight parameter of the artificial intelligence model is updated; wherein, the updated artificial intelligence model is used to perform material identification processing on candidate video frames.
- the material identification processing may be implemented by an artificial intelligence model, and the artificial intelligence model is a multi-classification model.
- train the artificial intelligence model for example, obtain multiple sample multimedia files and the actual type of materials in each sample multimedia file, and perform material identification processing on the sample multimedia files through the artificial intelligence model, and obtain the sample multimedia files. type of material.
- the sample multimedia file is a picture
- the artificial intelligence model is used to directly perform material identification processing on the sample multimedia file; if the sample multimedia file is a video, the artificial intelligence model frame extraction) for material identification processing.
- the difference between the type obtained by the material recognition processing and the actual type is determined, and the difference is the loss value.
- Back-propagation is performed in the artificial intelligence model according to the determined differences, and during the back-propagation process, the weight parameters of the artificial intelligence model are updated along the gradient descent direction.
- the updated artificial intelligence model can be used in step 502 to perform material identification processing on candidate video frames in the multimedia file.
- the material identification process can also be performed according to the updated artificial intelligence model.
- step 503 a material editing mode and an interactive template corresponding to the type of the material in the multimedia file are acquired.
- the embodiment of the present application performs periodic frame extraction processing on the multimedia file, and performs material identification processing on the extracted video frame, so that the material in the multimedia file can be obtained quickly and accurately. type.
- the artificial intelligence-based multimedia processing method provided by the embodiments of the present application may be implemented collaboratively by a terminal device and a server.
- a software client for multimedia editing is installed in the terminal device, and intelligent editing is realized by interacting with the server.
- the AI/AR capability (corresponding to the material editing capability above)
- the creative gameplay template above the capability (corresponding to the interactive template above) can be componentized.
- the type of material, the cloud intelligently matches the appropriate AI/AR capability, and then applies the creative gameplay template corresponding to the AI/AR capability to the multimedia file to realize intelligent editing of the multimedia file.
- the embodiment of the present application provides a schematic diagram of creating a creative gameplay template as shown in FIG. 4 .
- FIG. 4 for each AI/AR capability, according to its corresponding processing result (corresponding to the matching degree above) and matching factor threshold (corresponding to the matching degree threshold above), create at least one creative play template.
- the corresponding processing result is the area ratio of the recognized face
- the matching factor threshold is the area of the face.
- the proportion threshold such as 50%.
- the creative gameplay template supports componentized effects, so when creating a creative gameplay template, existing effects can be reused, thereby improving the degree of freedom and efficiency of creating a creative gameplay template.
- AI/AR effect (corresponding to the first effect above)
- non-AI/AR effect (corresponding to the first effect above).
- AI effects can be magic pendants, face effects, pictures or music card components for replacing backgrounds, etc.
- AR effects can be street stall effects or virtual animation character effects, etc.
- An embodiment of the present application provides a schematic diagram of selecting a multimedia file as shown in FIG. 5 .
- a plurality of multimedia files stored locally on a client are shown, for example, the shown multimedia File 51, the form of multimedia file includes but is not limited to video and picture.
- the user may select the multimedia file for intelligent editing by triggering the editing option 52 in FIG. 5 , or may also trigger the shooting option 53 to perform real-time shooting to obtain the multimedia file to be edited.
- the embodiment of the present application provides a schematic diagram of a multimedia file after applying the creative gameplay template as shown in FIG. 6 .
- the applied AI/AR capability is the background recognition capability
- the corresponding creative gameplay template includes the background 612 as shown in FIG. 6 .
- the background such as the sky
- a manual editing option 62 can also be presented.
- the user can manually edit the multimedia file 61 by triggering the manual editing option 62, such as switching the creative gameplay template of the application, adding or switching background music, Make manual edits, add extra effects like text, stickers or special effects, and more.
- the client can store the edited multimedia file 61 locally, or send it to a social network for sharing. Do limit.
- the embodiment of the present application also provides a schematic diagram of a multimedia file after applying the creative gameplay template as shown in FIG. 7 .
- the multimedia file 71 is obtained by real-time shooting, the background of which is a work station, and the applied AR capability is For the AR ability to identify the workstation, the applied creative play template includes effect 711, so that the sensory effect of augmented reality of setting up a stall on the workstation is realized.
- manual editing options 72 for the multimedia file 71 are also presented, including options for flipping, filters, and beauty, and options for adding magic effects and adding music.
- the embodiment of the present application also provides a schematic diagram of a confirmation prompt as shown in FIG. 8 .
- the client when a user publishes a multimedia file 81 , for example, when sharing the multimedia file 81 to a social network, the client can detect the watermark according to the AI /AR capability, identify the presence of a watermark of "@account xx" in the multimedia file 81, and output a confirmation prompt 82 to remind the user.
- a creative gameplay template including the effect of masking the watermark can also be applied to the multimedia file 81 to mask the watermark in the multimedia file 81 .
- Figure 6, Figure 7 and Figure 8 only a creative gameplay template integrating a single AI/AR effect is shown, but in practical application scenarios, a creative gameplay template may also include multiple AI/AR effects. The embodiment does not limit this.
- This embodiment of the present application does not limit the types of AI/AR capabilities.
- various AI/AR capabilities are shown, such as smart editing (capability 91 in FIG. 9 ), smart filters, and stylized filters. Mirror and face fusion, etc.
- smart editing capability 91 in FIG. 9
- smart filters smart filters
- stylized filters Mirror and face fusion, etc.
- this does not constitute a limitation on the embodiments of the present application.
- more AI/AR capabilities such as gender conversion and face value prediction, can be applied.
- the embodiment of the present application provides a schematic flowchart as shown in FIG. 10 , and for ease of understanding, the description is given in three stages.
- the client After the client obtains the multimedia file to be edited, if the multimedia file is a picture, it directly sends it to the server (that is, the cloud); if the multimedia file is a video, it performs frame extraction processing according to the set frame extraction frequency (that is, Content extraction in Figure 10 ), the obtained multiple candidate video frames are sent to the server for capability matching, wherein, in order to balance performance and quality, the data volume of the obtained multiple candidate video frames (corresponding to all the above occupied storage space) size, to determine whether to perform compression processing.
- the server that is, the cloud
- the multimedia file that is, the cloud
- the multimedia file a video
- the multimedia file performs frame extraction processing according to the set frame extraction frequency (that is, Content extraction in Figure 10 )
- the obtained multiple candidate video frames are sent to the server for capability matching, wherein, in order to balance performance and quality, the data volume of the obtained multiple candidate video frames (corresponding to all the above occupied storage space) size, to determine whether to perform compression processing.
- the server After receiving multiple candidate video frames sent by the client, the server performs concurrent matching through multiple AI/AR capabilities deployed in the cloud, and delivers the appropriate AI/AR capabilities and corresponding creative gameplay templates to the client according to the matching results. end.
- the type of material in multiple candidate video frames can be identified by the artificial intelligence model. If a face is identified, the AI/AR capability of face recognition and the corresponding creative play template will be issued; if the sky is identified, then Deliver AI/AR capabilities for sky recognition (one-click sky change) and corresponding creative gameplay templates. It is worth noting that multiple AI/AR capabilities and creative gameplay templates corresponding to each AI/AR capability can be deployed locally on the client side. In this case, the server issues the AI/AR capability ID (or number) and the logo (or number) of the creative game template, so that the consumption of communication resources can be reduced. In addition, when no suitable AI/AR capabilities are matched, editing can be ended directly.
- the second stage differentiated processing of application scenarios.
- the scene in which the multimedia file is a picture is determined as a non-real-time scene
- the scene in which the multimedia file is a video is determined as a real-time scene.
- the original images can be processed according to AI/AR capabilities to achieve higher processing accuracy.
- the creative gameplay template of the application can be reused. For example, if a video needs to be generated from a picture, and a creative play template has been applied to the picture, the creative play template can be continuously applied until the set duration (eg, 10 seconds) is reached.
- the original video can be compressed, that is, the same proportion of the video image can be reduced to a set size, which can integrate the processing consumption of AI/AR capabilities. time and the accuracy of the processing results.
- the third stage local secondary matching and template rendering.
- the client obtains the AI/AR capability issued by the server, it identifies and processes the multimedia file according to the AI/AR capability, and obtains the processing result. identification processing. Then, the obtained processing result is compared with the matching factor threshold corresponding to the AI/AR capability, and the final applied creative gameplay template is determined according to the comparison result.
- the matching factor threshold set in the real-time scene can be smaller than the matching factor threshold set in the non-real-time scene.
- the server delivers the ability to recognize the sky, and the AI/AR template and non-AI/AR template corresponding to the ability.
- the non-AI/AR template only includes non-AI/AR effects.
- AI/AR templates can include only AI/AR effects, or both AI/AR effects and non-AI/AR effects.
- the matching factor threshold corresponding to the sky recognition ability is 70% in non-real-time scenarios.
- the client recognizes the picture according to the sky recognition ability, if the proportion of the sky in the picture is greater than the matching factor threshold, such as the area If the proportion is 80%, the AI/AR effect in the AI/AR template will be applied; if the area proportion of the sky in the picture is less than or equal to the matching factor threshold, if the area proportion is 40%, the non-AI/AR template will be applied Non-AI/AR effects in .
- the matching factor threshold such as the area If the proportion is 80%, the AI/AR effect in the AI/AR template will be applied; if the area proportion of the sky in the picture is less than or equal to the matching factor threshold, if the area proportion is 40%, the non-AI/AR template will be applied Non-AI/AR effects in .
- the multimedia file is used as a video
- the matching factor threshold corresponding to the ability of sky recognition is 60% in a real-time scenario.
- the client recognizes multiple candidate video frames in the video according to the ability of sky recognition
- the matching factor threshold such as the average area ratio is 80%
- Apply the AI/AR template if the average area ratio of the sky in multiple candidate video frames is less than or equal to the matching factor threshold, if the average area ratio is 40%, the non-AI/AR effect in the non-AI/AR template is applied .
- the number of AI/AR templates corresponding to the sky recognition capability issued by the server may be multiple (multiple AI/AR templates may correspond to the same matching factor threshold). If the application AI/AR template is released, the first AI/AR template issued can be applied by default, or multiple AI/AR templates can be presented for the user to choose an application. Based on an AI/AR template that has been applied, other AI/AR templates can be presented for users to switch.
- the applied AI/AR template includes both AI/AR effects and non-AI/AR effects
- real-time judgment can also be performed, that is, for each video frame in the video, if the video If the area proportion of the sky in the frame is less than or equal to the matching factor threshold, the non-AI/AR effect in the AI/AR template will be applied; if the area proportion of the sky in the video frame is greater than the matching factor threshold, the AI/AR template will be applied AI/AR effects in .
- the corresponding effect can be rendered through the template rendering engine.
- the final applied effect is a starry sky image used to replace the sky background in the video frame
- the effect is rendered by the template rendering engine to update the background in the video frame to the starry sky image.
- the following technical effects can be achieved: 1) The ability itself is separated from the gameplay above the ability, which solves the limitation of the design side in the creation of the gameplay, and improves the freedom and efficiency of the creative gameplay template output; 2) Through intelligent matching, the ability to conform to multimedia files is obtained, which reduces the threshold for selecting multimedia files and improves the success rate of editing; 3) It can cover real-time scenes and non-real-time scenes, allowing users to experience more AI/AR capabilities 4) The ability and gameplay are componentized, which realizes the free combination of effects, greatly shortens the period for the creative gameplay template to go online, and at the same time achieves the separation of the creative gameplay template iteration and the client version iteration.
- the artificial intelligence-based multimedia processing apparatus 455 provided by the embodiments of the present application is implemented as a software module.
- the artificial intelligence-based multimedia processing apparatus stored in the memory 450 The software modules in 455 may include: an acquisition module 4551, configured to, in response to an editing operation for a multimedia file, acquire a material editing mode and an interactive template corresponding to the type of material in the multimedia file; an application module 4552, configured to edit according to the material The mode identifies the material in the multimedia file, and applies the interactive template to the identified material; the application completion module 4553 is configured to present the multimedia file after applying the interactive template.
- the interactive template is used to form a material-based interactive effect;
- the interactive effect includes a first effect and a second effect different from the first effect, and the first effect includes at least one of an augmented reality effect and a virtual reality effect;
- the application module 4552 is further configured to apply the first effect in the interactive template to the identified material when the identified material satisfies the material interaction condition corresponding to the material editing mode;
- the artificial intelligence-based multimedia processing device 455 further includes: setting a position The application module is configured to obtain the setting position corresponding to the second effect in the interactive template when the identified material does not meet the material interaction condition, and apply the second effect at the setting position in the multimedia file.
- the material interaction conditions include a first material interaction condition corresponding to a non-real-time scene, and a second material interaction condition corresponding to a real-time scene;
- the application module 4552 is further configured to: when the editing scene is a non-real-time scene, according to The material editing mode identifies the material in the multimedia file, obtains the matching degree between the material and the multimedia file, and when the matching degree is greater than the matching degree threshold in the first material interaction condition, it is determined that the identified material satisfies the first material interaction condition; when editing When the scene is a real-time scene, compress the multimedia file, identify the material in the compressed multimedia file according to the material editing mode, obtain the matching degree between the material and the compressed multimedia file, and when the matching degree is greater than the second material interaction condition When the matching degree threshold is determined, the identified material satisfies the second material interaction condition; wherein the matching degree threshold in the first material interaction condition is greater than the matching degree threshold in the second material interaction condition.
- the artificial intelligence-based multimedia processing apparatus 455 further includes: a mode screening module, configured to perform any one of the following processing when the number of material editing modes is multiple: for each material editing mode, according to the material editing mode The editing mode identifies the candidate material in the multimedia file, and when the candidate material satisfies the material interaction conditions corresponding to the material editing mode, the material editing mode is used as the material editing mode to be applied; the preview recognition process of multiple material editing modes is presented, and the In response to the selection operation for the material editing mode, the selected material editing mode is used as the material editing mode to be applied; wherein, the material editing mode to be applied is used to identify the material in the multimedia file.
- a mode screening module configured to perform any one of the following processing when the number of material editing modes is multiple: for each material editing mode, according to the material editing mode The editing mode identifies the candidate material in the multimedia file, and when the candidate material satisfies the material interaction conditions corresponding to the material editing mode, the material editing mode is used as the material editing mode to be applied; the
- the mode screening module is further configured to: when the multimedia file is a video, perform periodic frame extraction processing on the multimedia file to obtain candidate video frames; identify and obtain candidate materials from the candidate video frames according to the material editing mode;
- the application module 4552 is further configured to: identify the material in each video frame of the multimedia file according to the material editing mode to be applied.
- the artificial intelligence-based multimedia processing apparatus 455 further includes: a template screening module, configured to perform any one of the following processing when the number of interactive templates is multiple: when the identified material satisfies the corresponding one of the interactive templates When the material interaction conditions are met, the interactive template corresponding to the satisfied material interaction conditions is used as the interactive template to be applied; the preview application process of multiple interactive templates is presented, and in response to the selection operation for the interactive template, the selected interactive template is used as the interactive template to be applied.
- a template screening module configured to perform any one of the following processing when the number of interactive templates is multiple: when the identified material satisfies the corresponding one of the interactive templates When the material interaction conditions are met, the interactive template corresponding to the satisfied material interaction conditions is used as the interactive template to be applied; the preview application process of multiple interactive templates is presented, and in response to the selection operation for the interactive template, the selected interactive template is used as the interactive template to be applied.
- Interactive template to be applied configured to perform any one of the following processing when the number of interactive templates is multiple: when
- the artificial intelligence-based multimedia processing device 455 further includes: a frame extraction module, configured to perform periodic frame extraction processing on the multimedia file when the multimedia file is a video to obtain candidate video frames; a material identification module, It is configured to perform material identification processing on candidate video frames to obtain the type of material in the multimedia file.
- the artificial intelligence-based multimedia processing apparatus 455 further includes: a compression module, configured to perform compression processing on the multiple candidate video frames when the storage space occupied by the multiple candidate video frames is greater than the storage space threshold;
- the identification module is further configured to: perform material identification processing on the compressed multiple candidate video frames respectively, obtain the type of the material in each candidate video frame, and determine the type of the material with the largest proportion among the multiple candidate video frames, As the type of the material used to obtain the corresponding material editing mode; wherein, the proportion includes any one of the area proportion and the quantity proportion of the material.
- the artificial intelligence-based multimedia processing apparatus 455 further includes: a sample acquisition module, configured to acquire a plurality of sample multimedia files; a sample identification module, configured to perform material identification processing on the sample multimedia files through the artificial intelligence model, and obtain The type of the material in the sample multimedia file; the update module is configured to update the weight parameter of the artificial intelligence model according to the difference between the type obtained by the material identification processing and the actual type; wherein, the updated artificial intelligence model is used for the candidate video. Frames are processed for material recognition.
- the obtaining module 4551 is further configured to: in the candidate material editing modes corresponding to the multiple types respectively, obtain the material editing mode corresponding to the type of the material in the multimedia file, and obtain the material editing mode corresponding to the material editing mode at least one interactive template.
- the artificial intelligence-based multimedia processing apparatus 455 further includes: a prompt presentation module configured to present a confirmation prompt corresponding to the identified material; wherein the confirmation prompt includes the type of the identified material, the type of the material in the multimedia file, and the at least one of the location information of , and a preview result obtained after applying the interactive template; the application module 4552 is further configured to: in response to the confirmation operation for the confirmation prompt, apply the interactive template to the identified material.
- the interactive template includes the interactive effect and the duration of the interactive effect; the application module 4552 is further configured to: apply the interactive effect in the interactive template to the identified material, and maintain the applied interactive effect until the duration is reached.
- Embodiments of the present application provide a computer program product or computer program, where the computer program product or computer program includes computer instructions (executable instructions), and the computer instructions are stored in a computer-readable storage medium.
- the processor of the electronic device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the electronic device executes the above-mentioned artificial intelligence-based multimedia processing method in the embodiment of the present application.
- the embodiments of the present application provide a computer-readable storage medium storing executable instructions, wherein the executable instructions are stored, and when the executable instructions are executed by a processor, the processor will cause the processor to execute the method provided by the embodiments of the present application, for example , as shown in FIG. 3A , FIG. 3B , FIG. 3C , FIG. 3D or FIG. 3E , the multimedia processing method based on artificial intelligence.
- the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; it may also include one or any combination of the foregoing memories Various equipment.
- executable instructions may take the form of programs, software, software modules, scripts, or code, written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and which Deployment may be in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
- executable instructions may, but do not necessarily correspond to files in a file system, may be stored as part of a file that holds other programs or data, for example, a Hyper Text Markup Language (HTML, Hyper Text Markup Language) document
- HTML Hyper Text Markup Language
- One or more scripts in stored in a single file dedicated to the program in question, or in multiple cooperating files (eg, files that store one or more modules, subroutines, or code sections).
- executable instructions may be deployed to execute on one electronic device, or on multiple electronic devices located at one site, or alternatively, multiple electronic devices distributed across multiple sites and interconnected by a communication network execute on.
- acquiring a material editing mode and an interactive template corresponding to the type of material in the multimedia file can improve the success rate of multimedia editing and the utilization rate of computing resources.
- the interactive template includes the first effect and the second effect
- the corresponding effect is applied according to the identified material, which improves the applicability to different identification situations;
- the multimedia file is a video, the continuity of the application effect can be guaranteed and integrity.
- the multimedia file is compressed, so as to improve the efficiency of subsequent processing and meet the real-time requirements; at the same time, the matching degree threshold in the real-time scene is set to be smaller than the matching degree threshold in the non-real-time scene. , in order to meet the characteristics of compression processing.
- the multimedia file is a video
- periodic frame extraction processing is performed on the multimedia file
- material identification processing is performed on the extracted video frame, so that the type of the material included in the multimedia file can be quickly obtained, and the real-time requirement can be met.
- the material editing mode is separated from the interactive template, which solves the limitations when creating interactive templates, and improves the freedom and efficiency of creating interactive templates; when creating interactive templates, the componentized effects can be freely combined, which greatly improves the efficiency of creating interactive templates. It shortens the online cycle of interactive templates, and realizes the separation of interactive template iteration and client version iteration, that is, no matter what version the client is, just deploy the interactive template locally on the client to realize intelligent editing of multimedia files.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Television Signal Processing For Recording (AREA)
Abstract
一种基于人工智能的多媒体处理方法、装置、电子设备及计算机可读存储介质,所述方法包括:响应于针对多媒体文件的编辑操作,获取与多媒体文件中的素材的类型对应的素材编辑模式以及互动模板(101);根据素材编辑模式在多媒体文件中识别素材,并对所识别的素材应用互动模板(102);呈现应用互动模板后的多媒体文件(103)。
Description
相关申请的交叉引用
本申请基于申请号为202010837810.8、申请日为2020年08月19日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
本申请涉及人工智能和多媒体技术,尤其涉及一种基于人工智能的多媒体处理方法、装置、电子设备及计算机可读存储介质。
人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。计算机视觉(Computer Vision,CV)技术是人工智能的一个重要分支,主要通过摄影机和电脑代替人眼对目标进行识别、跟踪和测量等机器视觉,并进一步做图形处理,使电脑处理成为更适合人眼观察或传送给仪器检测的图像。
多媒体编辑是计算机视觉的一个重要应用,在相关技术提供的方案中,通常是由用户选择素材编辑模式,再上传多媒体文件,由电子设备根据素材编辑模式对多媒体文件进行编辑。但是,素材编辑模式往往对多媒体文件存在特定的要求,容易导致对用户上传的多媒体文件编辑失败,对于电子设备来说,容易对多媒体文件进行无效的处理,导致计算资源的浪费。
发明内容
本申请实施例的技术方案是这样实现的:
本申请实施例提供一种基于人工智能的多媒体处理方法,包括:
响应于针对多媒体文件的编辑操作,获取与所述多媒体文件中的素材的类型对应的素材编辑模式以及互动模板;
根据所述素材编辑模式在所述多媒体文件中识别所述素材,并对所识别的所述素材应用所述互动模板;
呈现应用所述互动模板后的所述多媒体文件。
本申请实施例提供一种基于人工智能的多媒体处理装置,包括:
获取模块,配置为响应于针对多媒体文件的编辑操作,获取与所述多媒体文件中的素材的类型对应的素材编辑模式以及互动模板;
应用模块,配置为根据所述素材编辑模式在所述多媒体文件中识别所述素材,并对所识别的所述素材应用所述互动模板;
应用完成模块,配置为呈现应用所述互动模板后的所述多媒体文件。
本申请实施例提供一种电子设备,包括:
存储器,用于存储可执行指令;
处理器,用于执行所述存储器中存储的可执行指令时,实现本申请实施例提供的基于人工智能的多媒体处理方法。
本申请实施例提供一种计算机可读存储介质,存储有可执行指令,用于引起处理器执行时,实现本申请实施例提供的基于人工智能的多媒体处理方法。
图1是本申请实施例提供的基于人工智能的多媒体处理系统的架构示意图;
图2是本申请实施例提供的终端设备的架构示意图;
图3A是本申请实施例提供的基于人工智能的多媒体处理方法的流程示意图;
图3B是本申请实施例提供的基于人工智能的多媒体处理方法的流程示意图;
图3C是本申请实施例提供的基于人工智能的多媒体处理方法的流程示意图;
图3D是本申请实施例提供的基于人工智能的多媒体处理方法的流程示意图;
图3E是本申请实施例提供的基于人工智能的多媒体处理方法的流程示意图;
图4是本申请实施例提供的创意玩法模板的示意图;
图5是本申请实施例提供的选择多媒体文件的示意图;
图6是本申请实施例提供的应用创意玩法模板后的多媒体文件的示意图;
图7是本申请实施例提供的应用创意玩法模板后的多媒体文件的示意图;
图8是本申请实施例提供的确认提示的示意图;
图9是本申请实施例提供的素材编辑能力的示意图;
图10是本申请实施例提供的基于人工智能的多媒体处理方法的流程示意图。
为了使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请作进一步地详细描述,所描述的实施例不应视为对本申请的限制,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本申请保护的范围。
在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。
在以下的描述中,所涉及的术语“第一\第二\第三”仅仅是是区别类似的对象,不代表针对对象的特定排序,可以理解地,“第一\第二\第三”在允许的情况下可以互换特定的顺序或先后次序,以使这里描述的本申请实施例能够以除了在这里图示或描述的以外的顺序实施。在以下的描述中,所涉及的术语“多个”是指至少两个。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本申请实施例的目的,不是旨在限制本申请。
对本申请实施例进行进一步详细说明之前,对本申请实施例中涉及的名词和术语进行说明,本申请实施例中涉及的名词和术语适用于如下的解释。
1)多媒体文件:指包含至少一种媒体形式的文件,例如,多媒体文件可以是图片、音频及视频中的任意一种。
2)素材:指多媒体文件中的内容(或称对象),例如素材的类型可以为人脸、猫、狗或天空等。
3)素材编辑模式:用于在多媒体文件中识别素材,素材编辑模式用于提供素材编辑能力,即用于识别素材的能力,例如人脸识别能力或天空识别能力等。
4)互动模板:包括互动效果,用于应用至多媒体文件中,以形成基于素材的互动效果。本申请实施例对互动效果的具体表示形式不做限定,例如可以是视频、图片、音 频、动画、特效或变速效果等。
5)人工智能模型:基于人工智能原理构建的模型,本申请实施例对人工智能模型的类型不做限定,例如人工智能模型可以是神经网络模型。
6)虚拟现实(Virtual Reality,VR):利用现实生活中的数据,通过计算机技术产生的电子信号,将其与各种输出设备结合,使其转化为能够让人们感受到的模拟环境。利用虚拟现实技术创建的效果即为虚拟现实效果。
7)增强现实(Augmented Reality,AR):将计算机生成的文字、图像、三维模型、音乐或视频等虚拟信息模拟仿真后,应用到真实世界中,虚拟信息和真实世界中的信息互为补充,从而实现对真实世界的增强。利用增强现实技术创建的效果即为增强现实效果。
8)数据库(Database):以一定方式储存在一起、能与多个用户共享、具有尽可能小的冗余度、与应用程序彼此独立的数据集合,用户可以对数据库中的数据执行新增、查询、更新及删除等操作。
本申请实施例提供一种基于人工智能的多媒体处理方法、装置、电子设备和计算机可读存储介质,能够对多媒体文件进行有效编辑,提升编辑成功率,同时提升电子设备所耗费的计算资源的利用率。下面说明本申请实施例提供的电子设备的示例性应用,本申请实施例提供的电子设备可以实施为终端设备,也可以实施为服务器。电子设备通过运行本申请实施例提供的多媒体处理方案,能够对多媒体文件进行有效编辑,提升计算资源的利用率,即提高电子设备自身的编辑性能,适用于多种编辑场景。
参见图1,图1是本申请实施例提供的基于人工智能的多媒体处理系统100的架构示意图,终端设备400通过网络300连接服务器200,服务器200连接数据库500,其中,网络300可以是广域网或者局域网,又或者是二者的组合。
在一些实施例中,以电子设备是终端设备为例,本申请实施例提供的基于人工智能的多媒体处理方法可以由终端设备实现。例如,终端设备400在接收到针对某个多媒体文件的编辑操作时,获取与多媒体文件中的素材的类型对应的素材编辑模式以及互动模板,根据素材编辑模式在多媒体文件中识别素材,并对所识别的素材应用互动模板。最终,终端设备400将应用互动模板后的多媒体文件呈现于图形界面410中。其中,多媒体处理过程中涉及到的各种结果(如多媒体文件、素材编辑模式、互动模板等)可以预先存储在终端设备400本地,也可以由终端设备400向外界(如服务器200、数据库500、区块链等)获取得到,后一种方式能够减少终端设备400的存储资源占用。
在一些实施例中,以电子设备是服务器为例,本申请实施例提供的基于人工智能的多媒体处理方法也可以由服务器和终端设备协同实现。例如,服务器200在接收到终端设备400发送的针对多媒体文件的编辑操作时,从数据库500中获取与多媒体文件中的素材的类型对应的素材编辑模式以及互动模板。然后,服务器200可以根据素材编辑模式在多媒体文件中识别素材,对所识别的素材应用互动模板,并将应用互动模板的多媒体文件发送至终端设备400,以使终端设备400将应用互动模板的多媒体文件呈现于图形界面410中。或者,服务器200也可以将素材编辑模式和互动模板发送至终端设备400,以使终端设备400在本地根据素材编辑模式识别多媒体文件中的素材,并对识别的素材应用互动模板,其中,可以预先将多个类型分别对应的素材编辑模式及互动模板存储在终端设备400本地,如此,服务器200可仅发送素材编辑模式的标识(或编号)和互动模板的标识(或编号),终端设备400便可在本地调用相应的素材编辑模式和互动模板进行编辑,能够减少通信资源的消耗。值得说明的是,其中,多媒体处理过程中涉及到的各种结果(如多媒体文件、素材编辑模式、互动模板等)可以预先存储在服务器200的分布式文件系统或数据库500中,也可以由服务器200向外界(如区块链等处)获取 得到,后一种方式能够减少服务器200的存储资源占用。
终端设备400用于在图形界面410中,显示多媒体处理过程中涉及到的各种结果。在图1中,示例性地示出了待编辑的多媒体文件411、针对多媒体文件411的编辑选项、以及应用互动模板后的多媒体文件412。其中,当针对多媒体文件411的编辑选项被触发时,确定接收到针对多媒体文件411的编辑操作;应用的互动模板包括用于替换的人脸,即多媒体文件412是通过对多媒体文件411进行人脸替换得到的。
在一些实施例中,终端设备400或服务器200可以通过运行计算机程序来实现本申请实施例提供的基于人工智能的多媒体处理方法,例如,计算机程序可以是操作系统中的原生程序或软件模块;可以是本地(Native)应用程序(APP,Application),即需要在操作系统中安装才能运行的程序;也可以是小程序,即只需要下载到浏览器环境中就可以运行的程序;还可以是能够嵌入至任意APP中的小程序。总而言之,上述计算机程序可以是任意形式的应用程序、模块或插件。
在一些实施例中,服务器200可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(Content Delivery Network,CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器,其中,云服务可以是多媒体处理服务,供终端设备400进行调用,以对终端设备400发送的多媒体文件进行编辑,最终将应用互动模板后的多媒体文件发送至终端设备400。终端设备400可以是智能手机、平板电脑、笔记本电脑、台式计算机、智能音箱及智能手表等,但并不局限于此。终端设备以及服务器可以通过有线或无线通信方式进行直接或间接地连接,本申请实施例中不做限制。
在一些实施例中,数据库500和服务器200可以独立设置。在一些实施例中,数据库500和服务器200也可以集成在一起,即数据库500可以视为存在于服务器200内部,与服务器200一体化,服务器200可以提供数据库500的数据管理功能。
以本申请实施例提供的电子设备是终端设备为例说明,可以理解的,对于电子设备是服务器的情况,图2中示出的结构中的部分(例如用户接口、呈现模块和输入处理模块)可以缺省。参见图2,图2是本申请实施例提供的终端设备400的结构示意图,图2所示的终端设备400包括:至少一个处理器410、存储器450、至少一个网络接口420和用户接口430。终端400中的各个组件通过总线系统440耦合在一起。可理解,总线系统440用于实现这些组件之间的连接通信。总线系统440除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图2中将各种总线都标为总线系统440。
处理器410可以是一种集成电路芯片,具有信号的处理能力,例如通用处理器、数字信号处理器(DSP,Digital Signal Processor),或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等,其中,通用处理器可以是微处理器或者任何常规的处理器等。
用户接口430包括使得能够呈现媒体内容的一个或多个输出装置431,包括一个或多个扬声器和/或一个或多个视觉显示屏。用户接口430还包括一个或多个输入装置432,包括有助于用户输入的用户接口部件,比如键盘、鼠标、麦克风、触屏显示屏、摄像头、其他输入按钮和控件。
存储器450可以是可移除的,不可移除的或其组合。示例性的硬件设备包括固态存储器,硬盘驱动器,光盘驱动器等。存储器450可选地包括在物理位置上远离处理器410的一个或多个存储设备。
存储器450包括易失性存储器或非易失性存储器,也可包括易失性和非易失性存储 器两者。非易失性存储器可以是只读存储器(ROM,Read Only Memory),易失性存储器可以是随机存取存储器(RAM,Random Access Memory)。本申请实施例描述的存储器450旨在包括任意适合类型的存储器。
在一些实施例中,存储器450能够存储数据以支持各种操作,这些数据的示例包括程序、模块和数据结构或者其子集或超集,下面示例性说明。
操作系统451,包括用于处理各种基本系统服务和执行硬件相关任务的系统程序,例如框架层、核心库层、驱动层等,用于实现各种基础业务以及处理基于硬件的任务;
网络通信模块452,用于经由一个或多个(有线或无线)网络接口420到达其他电子设备,示例性的网络接口420包括:蓝牙、无线相容性认证(WiFi)、和通用串行总线(USB,Universal Serial Bus)等;
呈现模块453,用于经由一个或多个与用户接口430相关联的输出装置431(例如,显示屏、扬声器等)使得能够呈现信息(例如,用于操作外围设备和显示内容和信息的用户接口);
输入处理模块454,用于对一个或多个来自一个或多个输入装置432之一的一个或多个用户输入或互动进行检测以及翻译所检测的输入或互动。
在一些实施例中,本申请实施例提供的基于人工智能的多媒体处理装置可以采用软件方式实现,图2示出了存储在存储器450中的基于人工智能的多媒体处理装置455,其可以是程序和插件等形式的软件,包括以下软件模块:获取模块4551、应用模块4552及应用完成模块4553,这些模块是逻辑上的,因此根据所实现的功能可以进行任意的组合或进一步拆分。将在下文中说明各个模块的功能。
在另一些实施例中,本申请实施例提供的基于人工智能的多媒体处理装置可以采用硬件方式实现,作为示例,本申请实施例提供的基于人工智能的多媒体处理装置可以是采用硬件译码处理器形式的处理器,其被编程以执行本申请实施例提供的基于人工智能的多媒体处理方法,例如,硬件译码处理器形式的处理器可以采用一个或多个应用专用集成电路(ASIC,Application Specific Integrated Circuit)、DSP、可编程逻辑器件(PLD,Programmable Logic Device)、复杂可编程逻辑器件(CPLD,Complex Programmable Logic Device)、现场可编程门阵列(FPGA,Field-Programmable Gate Array)或其他电子元件。
将结合本申请实施例提供的电子设备的示例性应用和实施,说明本申请实施例提供的基于人工智能的多媒体处理方法。
参见图3A,图3A是本申请实施例提供的基于人工智能的多媒体处理方法的流程示意图,将结合图3A示出的步骤进行说明。
在步骤101中,响应于针对多媒体文件的编辑操作,获取与多媒体文件中的素材的类型对应的素材编辑模式以及互动模板。
这里,多媒体文件可以是预先存储于电子设备本地的,也可以是电子设备从外界(如互联网)获取到的,还可以是电子设备实时采集的,例如通过电子设备的摄像头和/或麦克风实时采集到的。当电子设备检测到(或称接收到)针对多媒体文件的编辑操作时,获取与多媒体文件中的素材的类型对应的素材编辑模式以及互动模板。本申请实施例对编辑操作的类型不做限定,例如可以是接触式的操作(如点击操作或长按操作等),也可以是非接触式的操作(如手势操作或语音输入操作等)。例如,电子设备可以在图形界面呈现针对多媒体文件的编辑选项,并将针对该编辑选项的触发操作(如点击操作或长按操作等)作为针对多媒体文件的编辑操作。
电子设备响应于针对多媒体文件的编辑操作,获取与多媒体文件中的素材的类型对应的素材编辑模式以及互动模板,其中,多媒体文件中的素材的类型可以是直接获取到的,例如由用户输入,也可以是对多媒体文件进行素材识别处理得到的。举例来说, 预先设定人脸类型对应的素材编辑模式包括人脸识别能力(如通过人脸识别模型来实现人脸识别能力),对应的互动模板包括人脸特效,例如对人脸中的五官进行装饰的特效;天空类型对应的素材编辑模式包括天空识别能力,对应的互动模板包括用于替换天空的图片。如此,能够将素材编辑模式及互动模板进行组件化,降低两者之间的耦合度,对于素材编辑模式及互动模板的创建过程来说,也可以提升其自由度。
值得说明的是,多媒体文件可能包括多种类型的素材,例如在一张图片中同时包括人脸和天空,故在步骤101中可能获取到多个素材编辑模式及多个互动模板。
在一些实施例中,可以通过这样的方式来实现上述的获取与多媒体文件中的素材的类型对应的素材编辑模式以及互动模板:在与多个类型分别对应的候选素材编辑模式中,获取与多媒体文件中的素材的类型对应的素材编辑模式,并获取与素材编辑模式对应的至少一个互动模板。
在本申请实施例中,针对素材的多个类型,可以分别设定对应的素材编辑模式(为了便于区分,命名为候选素材编辑模式),并设定与每一个候选素材编辑模式对应的至少一个互动模板。例如,设定人脸类型对应包括人脸识别能力的候选素材编辑模式,该候选素材编辑模式对应互动模板A和互动模板B,互动模板A和B中包括不同样式的人脸特效;设定天空类型对应包括天空识别能力的候选素材编辑模式,该候选素材编辑模式对应互动模板C和互动模板D,其中,互动模板C包括用于替换天空的星空图片,互动模板D包括用于替换天空的瀑布图片。如此,在得到多媒体文件中的素材的类型后,在多个候选素材编辑模式中获取与该类型对应的素材编辑模式,并进一步获取与该素材编辑模式对应的至少一个互动模板。通过上述方式,能够提升获取操作的有序化,便于获取到准确的素材编辑模式及互动模板。
在步骤102中,根据素材编辑模式在多媒体文件中识别素材,并对所识别的素材应用互动模板。
这里,根据获取到的素材编辑模式,在多媒体文件中识别素材,并对所识别的素材应用获取到的互动模板。例如,在所识别的素材的基础上,添加互动模板中的特效;又例如,将所识别的素材替换为互动模板中的图片,互动模板的应用方式并不限于此,可根据实际应用场景进行设定。
在一些实施例中,根据素材编辑模式在多媒体文件中识别素材之后,还包括:呈现与所识别的素材对应的确认提示;其中,确认提示包括所识别的素材的类型、在多媒体文件中的位置信息、以及应用互动模板后得到的预览结果中的至少一种;可以通过这样的方式来实现上述的对所识别的素材应用互动模板:响应于针对确认提示的确认操作,对所识别的素材应用互动模板。
在识别出多媒体文件中的素材后,可以在电子设备的图形界面中呈现与所识别的素材对应的确认提示。在本申请实施例中,确认提示可以包括所识别的素材的类型、素材在多媒体文件中的位置信息、以及应用互动模板后得到的预览结果中的至少一种,根据实际应用场景的不同,确认提示可以包括更多或更少的内容。
其中,素材在多媒体文件中的位置信息可以是时间位置信息,也可以是素材在一个视频帧或一张图片中的区域位置信息,以多媒体文件为视频、素材的类型为人脸的情况进行举例,则位置信息可以是视频中包括人脸的视频帧所在的时间戳,也可以是在包括人脸的视频帧中,人脸所占的具体区域。另外,应用互动模板后得到的预览结果,可以是对多媒体文件中所识别的至少部分素材应用互动模板得到的,例如,对视频中包括人脸的任意一个视频帧(如第一个视频帧)应用互动模板,得到预览结果。
用户可以通过确认提示,快速、准确地判断是否应用互动模板。电子设备在接收到针对确认提示的确认操作时,对所识别的素材应用互动模板;在接收到针对确认提示 的否认操作时,中止对多媒体文件的编辑。如此,通过人机交互的手段确定是否应用互动模板,能够减少计算资源的浪费。
当然,确认提示的作用并不限于此,例如,当获取的素材编辑模式包括违规内容识别能力时,可以呈现与所识别的素材对应的确认提示,该确认提示包括所识别的素材的类型(即违规内容)以及在多媒体文件中的位置信息,以用于提示多媒体文件中存在违规内容,其中,违规内容如水印。用户可以根据确认提示,手动地修改多媒体文件中的内容,或者,电子设备也可以在接收到针对确认提示的确认操作时,对所识别的素材应用互动模板,该互动模板用于屏蔽违规内容,例如包括用于覆盖违规内容的马赛克。
在一些实施例中,互动模板包括互动效果及互动效果的持续时长;可以通过这样的方式来实现上述的对所识别的素材应用互动模板:对所识别的素材应用互动模板中的互动效果,并保持应用的互动效果直至达到持续时长。
例如,多媒体文件为总长10秒的视频,视频中的素材的类型为人脸,人脸在视频中的时间位置信息包括第3秒、第3.5秒、第4秒、第7秒、第7.5秒及第8秒,获取到的互动模板包括人脸特效的互动效果和1秒的持续时长,则在视频中的第一个出现人脸的视频帧(第3秒的视频帧)中添加互动模板中的人脸特效,并保持添加的人脸特效不变直至时长达到1秒,在达到1秒后,再在下一个出现人脸的视频帧(第7秒的视频帧)中添加人脸特效,并保持添加的人脸特效不变直至时长达到1秒,以此类推。如此,能够对应用的互动效果进行复用,节省频繁应用互动效果所带来的处理资源消耗。
又例如,多媒体文件为一张图片,图片中的素材的类型为人脸,获取到的互动模板包括人脸特效的互动效果和5秒的持续时长,则在图片包括的人脸的基础上添加互动模板中的人脸特效,并保持添加的人脸特效不变直至时长达到5秒,以生成视频,该视频中的每一个视频帧均为添加有人脸特效的图片。如此,实现了互动效果的另一种复用。
值得说明的是,由于互动模板包括的互动效果本身支持组件化,故在创建互动模板时,能够复用已有的互动效果,提升创建互动模板的自由度和效率。
在一些实施例中,根据素材编辑模式在多媒体文件中识别素材之后,还包括:根据所识别的素材在多媒体文件中的位置信息,对多媒体文件进行裁剪处理。
同样地,这里的位置信息可以是时间位置信息,也可以是素材在一个视频帧或一张图片中的区域位置信息。若位置信息为时间位置信息,则可以根据素材的时间位置信息中第一个时间戳(即最早的时间戳)和最后一个时间戳(即最晚的时间戳),对多媒体文件进行裁剪处理,例如多媒体文件为总长30秒的视频,所识别的素材在多媒体文件中的时间位置信息包括第5秒、第7秒和第10秒,则对多媒体文件进行针对时间的裁剪处理,即是裁剪出多媒体文件中第5秒至第10秒的部分。若位置信息为区域位置信息,则可以根据区域位置信息对多媒体文件进行针对区域的裁剪处理,例如多媒体文件为一张图片,所识别的素材在该图片中的区域位置信息为左半部分,则裁剪出该图片的左半部分。
在裁剪完成后,可以对裁剪后的多媒体文件中所识别的素材应用互动模板。通过上述方式,能够根据所识别的素材,裁剪出多媒体文件中最重要的部分,提升多媒体编辑的智能程度,适用于制作卡点视频等场景。
在步骤103中,呈现应用互动模板后的多媒体文件。
这里,可以在图形界面中呈现应用互动模板后的多媒体文件,例如,呈现应用互动模板后的图片,或者播放应用互动模板后的视频或音频。在此基础上,还可呈现手动编辑的选项,以便用户对应用互动模板后的多媒体文件进行手动编辑,例如对多媒体文件进行手动裁剪,又例如手动添加效果,如文字、贴纸、特效或音乐等。
如图3A所示,本申请实施例通过获取与多媒体文件中的素材的类型对应的素材编 辑模式以及互动模板,能够降低多媒体文件的选择门槛,提升多媒体编辑的成功率,同时提升电子设备在处理过程中所耗费的计算资源的利用率。
在一些实施例中,参见图3B,图3B是本申请实施例提供的基于人工智能的多媒体处理方法的流程示意图,图3A示出的步骤102可以通过步骤201至步骤202实现,将结合各步骤进行说明。
在步骤201中,根据素材编辑模式在多媒体文件中识别素材。
在步骤202中,当所识别的素材满足素材编辑模式对应的素材互动条件时,对所识别的素材应用互动模板中的第一效果。
在本申请实施例中,互动模板用于形成基于素材的互动效果,互动效果包括第一效果和区别于第一效果的第二效果,第一效果包括增强现实效果和虚拟现实效果中的至少一个,或者,可以将直接应用于素材的效果作为第一效果,将第一效果之外的效果作为第二效果,无论其表现形式如何。在本申请实施例中,对互动模板中包括的第一效果及第二效果的数量不做限定。
根据获取到的素材编辑模式,在多媒体文件中识别素材之后,将所识别的素材与素材编辑模式对应的素材互动条件进行对比,其中,素材编辑模式对应的素材互动条件可以预先设定,例如,多媒体文件为图片,素材互动条件包括某类型的素材的面积占比阈值,则当该类型的素材在图片中的面积占比大于面积占比阈值时,确定满足素材互动条件。
当所识别的素材满足素材互动条件时,对所识别的素材应用第一效果。例如,当第一效果为人脸的动漫化特效时,将所识别的素材(即人脸)通过动漫化特效进行转换,得到动漫化人脸,以形成虚拟现实的感官效果;又例如,当第一效果为地摊特效时,可以在所识别的素材(例如地面)的区域,叠加呈现该地摊特效,以形成在真实世界中摆地摊的增强现实的感官效果。
在图3B中,在步骤201之后,还可以在步骤203中,当所识别的素材未满足素材互动条件时,获取互动模板中的第二效果对应的设定位置,并在多媒体文件中的设定位置应用第二效果。
当所识别的素材未满足素材互动条件,例如素材的面积占比小于或等于面积占比阈值时,所识别的素材并不适于直接应用效果,故获取第二效果对应的设定位置,并在多媒体文件中的设定位置应用互动模板中的第二效果。本申请实施例对设定位置不做限定,可根据实际应用场景进行具体设定。另外,第二效果对应的设定位置可包含于互动模板中,当然也可存储于其他地方。
举例来说,多媒体文件为图片,图片中的素材的类型为人脸,获取到的互动模板包括的第二效果为“人脸即将出现”的文本,包括的设定位置为图片正中央,则当所识别的素材未满足素材互动条件时,在图片的正中央显示“人脸即将出现”的文本。
值得说明的是,当多媒体文件为视频时,可以以视频帧为单位,判断所识别的素材是否满足素材互动条件。当某个视频帧中所识别的素材满足素材互动条件时,对该视频帧中所识别的素材应用第一效果;当某个视频帧中所识别的素材未满足素材互动条件时,在该视频帧中的设定位置应用第二效果。
在一些实施例中,可以通过这样的方式来实现上述的根据素材编辑模式在多媒体文件中识别素材:当编辑场景为非实时场景时,根据素材编辑模式在多媒体文件中识别素材,得到素材与多媒体文件的匹配程度,并当匹配程度大于素材编辑模式对应的第一素材互动条件中的匹配程度阈值时,确定所识别的素材满足第一素材互动条件;当编辑场景为实时场景时,对多媒体文件进行压缩处理,根据素材编辑模式在压缩后的多媒体文件中识别素材,得到素材与压缩后的多媒体文件的匹配程度,并当匹配程度大于素材 编辑模式对应的第二素材互动条件中的匹配程度阈值时,确定所识别的素材满足第二素材互动条件;其中,第一素材互动条件中的匹配程度阈值大于第二素材互动条件中的匹配程度阈值。
在本申请实施例中,可将编辑场景区分为对实时性要求较低的非实时场景、以及对实时性要求较高的实时场景,例如,将多媒体文件为图片的场景确定为非实时场景,将多媒体文件为视频的场景确定为实时场景;又例如,将多媒体文件预先存储在电子设备本地的场景确定为非实时场景,将多媒体文件是实时采集或者实时从外界获取到的场景确定为实时场景。另外,针对每一个素材编辑模型,可以预先设定素材编辑模式适用于非实时场景的第一素材互动条件、以及适用于实时场景的第二素材互动条件,
当编辑场景为非实时场景时,根据获取到的素材编辑模式在待编辑的多媒体文件(即原始的多媒体文件)中识别素材,得到素材与多媒体文件的匹配程度。然后,将得到的匹配程度与素材编辑模式对应的第一素材互动条件中的匹配程度阈值进行对比,当匹配程度大于匹配程度阈值时,确定所识别的素材满足第一素材互动条件。例如,素材与多媒体文件的匹配程度,可以是素材在多媒体文件中的面积占比,匹配程度阈值可以是面积占比阈值。
当编辑场景为实时场景时,由于对编辑的实时性要求较高,故对多媒体文件进行压缩处理,根据素材编辑模式在压缩后的多媒体文件中识别素材,得到素材与压缩后的多媒体文件的匹配程度。其中,对多媒体文件进行压缩处理,可以是将多媒体文件的尺寸缩小至设定尺寸,设定尺寸可根据素材编辑模式的处理耗时和识别准确度进行权衡设置,不同素材编辑模式对应的设定尺寸可以不同。当得到的匹配程度大于素材编辑模式对应的第二素材互动条件中的匹配程度阈值时,确定所识别的素材满足第二素材互动条件,其中,第一素材互动条件中的匹配程度阈值大于第二素材互动条件中的匹配程度阈值。通过压缩处理的方式,能够提升根据素材编辑模式识别素材的效率,满足实时性要求;同时,考虑到进行压缩处理后,识别素材时可能会出现准确度变低的情况,故设定第二素材互动条件中的匹配程度阈值小于第一素材互动条件中的匹配程度阈值,以符合压缩处理的特点。
如图3B所示,本申请实施例根据所识别的素材是否满足相应的素材互动条件,来应用不同的效果,提升了对不同识别情况的适用性。在多媒体文件为视频时,能够保证应用效果的连贯性和完整性。
在一些实施例中,参见图3C,图3C是本申请实施例提供的基于人工智能的多媒体处理方法的流程示意图,基于图3A,在步骤101之后,还可以在步骤301中,针对每个素材编辑模式,根据素材编辑模式在多媒体文件中识别得到候选素材,并当候选素材满足素材编辑模式对应的素材互动条件时,将素材编辑模式作为待应用的素材编辑模式。
当获取到的素材编辑模式的数量为多个时,可以对多个素材编辑模式进行筛选。例如,针对每个素材编辑模式,根据素材编辑模式在多媒体文件中识别,为了便于区分,将这里所识别的素材命名为候选素材。当根据某个素材编辑模式得到的候选素材满足该素材编辑模式对应的素材互动条件时,将该素材编辑模式作为待应用的素材编辑模式。如此,能够实现素材编辑模式的智能选取,使得待应用的素材编辑模式能够与多媒体文件中的素材相符合。
在图3C中,在步骤101之后,还可以在步骤302中,呈现多个素材编辑模式的预览识别过程,并响应于针对素材编辑模式的选择操作,将被选择的素材编辑模式作为待应用的素材编辑模式。
本申请实施例还提供了对多个素材编辑模式进行筛选的另一种方式,即在图形界 面中呈现多个素材编辑模式的预览识别过程,并响应于针对素材编辑模式的选择操作,将被选择(被选中)的素材编辑模式作为待应用的素材编辑模式。其中,可以呈现根据素材编辑模式对样本多媒体文件进行识别的预览识别过程,也可以呈现根据素材编辑模式对待编辑的多媒体文件进行识别的预览识别过程,对于样本多媒体文件或多媒体文件为视频的情况,可以呈现对其中的某个或某几个视频帧进行识别的预览识别过程。
预览识别过程可以包括识别前后的结果,以素材编辑模式包括人脸识别能力、且样本多媒体文件为样本图片的情况进行举例,则示出的预览识别过程可以包括原始的样本图片,以及包括有识别得到的人脸位置(例如以虚线的形式来突出人脸位置)的样本图片。除了呈现预览识别过程之外,在本申请实施例中,还可以示出与素材编辑模式相关的其他信息,例如呈现素材编辑模式的名称,如人脸识别模式及天空识别模式等。如此,通过人机交互的方式来筛选素材编辑模式,使得待应用的素材编辑模式能够符合用户的实际需求。根据实际应用场景的不同,可以应用步骤301及步骤302中的任意一个步骤,以筛选素材编辑模式。
除了上述两种方式外,还可以将多个素材编辑模式中的任意一个素材编辑模式(例如第一个素材编辑模式),作为待应用的素材编辑模式。在确定了待应用的素材编辑模式之后,还可以在图形界面中呈现其他的素材编辑模式,以供用户切换。
在图3C中,图3A示出的步骤102可更新为步骤303,在步骤303中,根据待应用的素材编辑模式在多媒体文件中识别素材,并对所识别的素材应用互动模板。
在一些实施例中,可以通过这样的方式来实现上述的根据素材编辑模式在多媒体文件中识别得到候选素材:当多媒体文件为视频时,对多媒体文件进行周期性的抽帧处理,得到候选视频帧;根据素材编辑模式在候选视频帧中识别得到候选素材;可以通过这样的方式来实现上述的根据待应用的素材编辑模式在多媒体文件中识别素材:根据待应用的素材编辑模式在多媒体文件的每个视频帧中识别素材。
这里,当多媒体文件为视频时,可以对多媒体文件进行周期性的抽帧处理,例如抽帧频率为2秒1次,最终得到多个候选视频帧。然后,根据素材编辑模式在候选视频帧中识别得到候选素材,如此,可以在筛选素材编辑模式时减轻处理压力,提升实时性。在此基础上,在当得到的多个候选视频帧所占的存储空间大于存储空间阈值时,可以对多个候选视频帧进行压缩处理,再根据素材编辑模式在压缩后的候选视频帧中进行识别,从而进一步提升实时性。
值得说明的是,在得到候选素材后,可以设定当所有候选视频帧中的候选素材均满足素材编辑模式对应的素材互动条件时,将该素材编辑模式作为待应用的素材编辑模式;也可以设定当设定数量或设定比例的候选视频帧中的候选素材满足素材编辑模式对应的素材互动条件时,将该素材编辑模式作为待应用的素材编辑模式。
筛选出待应用的素材编辑模式后,由于多媒体文件为视频,故根据待应用的素材编辑模式,在多媒体文件的每个视频帧中识别素材。同样地,这里也可以对多媒体文件的每个视频帧进行压缩处理,再在压缩后的每个视频帧中识别素材,以满足实时性要求。通过上述方式,在多媒体文件为视频时,能够降低筛选素材编辑模式时的处理压力,从而能够快速地确定出待应用的素材编辑模式。
如图3C所示,在获取到的素材编辑模式的数量为多个的情况下,本申请实施例提供了智能选取和人机交互选取的两种筛选方式,提升了筛选的灵活性和准确性。
在一些实施例中,参见图3D,图3D是本申请实施例提供的基于人工智能的多媒体处理方法的流程示意图,图3A示出的步骤102可通过步骤401至步骤404实现。
在步骤401中,根据素材编辑模式在多媒体文件中识别素材。
在步骤402中,当所识别的素材满足任一互动模板对应的素材互动条件时,将满 足的素材互动条件对应的互动模板作为待应用的互动模板。
这里,当获取到的互动模板的数量为多个时,可以对多个互动模板进行筛选。例如,获取每一个互动模板对应的素材互动条件,当根据素材编辑模式识别出的素材满足任一互动模板对应的素材互动条件时,将满足的素材互动条件对应的互动模板作为待应用的互动模板。
举例来说,某一个素材的类型对应的互动模板包括第一互动模板和第二互动模板,第一互动模板同时包括有第一效果和第二效果,第二互动模板仅包括有第二效果,且第一互动模板和第二互动模板对应不同的素材互动条件。若多媒体文件中所识别的素材满足第一互动模板对应的素材互动条件,则将第一互动模板作为待应用的互动模板。在此基础上,若第一互动模板的数量为多个,且多个第一互动模板对应相同的素材互动条件,则可以得到多个待应用的互动模板,针对该情况,可以应用多个第一互动模板中的任意一个,并在图形界面中呈现其他的第一互动模板,以供用户切换。该方式同样适用于其他的待应用的互动模板的数量为多个的情况。
值得说明的是,当多媒体文件为视频时,步骤402中用于与素材互动条件进行对比的素材,可以是多媒体文件的候选视频帧(通过周期性的抽帧处理得到)中所识别的素材,如此,能够提升筛选互动模板的效率。
在步骤403中,呈现多个互动模板的预览应用过程,并响应于针对互动模板的选择操作,将被选择的互动模板作为待应用的互动模板。
除了步骤402示出的方式外,在本申请实施例中,还可以呈现多个互动模板的预览应用过程,并响应于针对互动模板的选择操作,将被选择的互动模板作为待应用的互动模板。其中,可以呈现在样本多媒体文件中应用互动模板的预览应用过程,也可以呈现在待编辑的多媒体文件中应用互动模板的预览应用过程,对于样本多媒体文件或多媒体文件为视频的情况,可以呈现对其中的某个或某几个视频帧的预览应用过程。预览应用过程可以包括应用前后的结果,以互动模板包括星空图片、且样本多媒体文件为包括人脸和背景的样本图片的情况进行举例,则示出的预览应用过程可以包括原始的样本图片,以及将背景替换为星空图片的样本图片。除了呈现预览应用过程之外,在本申请实施例中,还可以示出与互动模板相关的其他信息,例如呈现互动模板的名称,如星空背景模板及人脸特效模板等。根据实际应用场景的不同,可以应用步骤402及步骤403中的任意一个步骤,以筛选互动模板。
除了上述两种方式外,还可以直接将多个互动模板中的任意一个互动模板(例如第一个互动模板),作为待应用的互动模板。在确定了待应用的互动模板之后,可以在图形界面中呈现其他的互动模板,以供用户切换。
在步骤404中,将待应用的互动模板应用至所识别的素材。
如图3D所示,在获取到的互动模板的数量为多个的情况下,本申请实施例同样提供了智能选取和人机交互选取的两种筛选方式,提升了筛选的灵活性和准确性。
在一些实施例中,参见图3E,图3E是本申请实施例提供的基于人工智能的多媒体处理方法的流程示意图,图3A示出的步骤101可通过步骤501至步骤503实现,将结合各个步骤进行说明。
在步骤501中,响应于针对多媒体文件的编辑操作,当多媒体文件为视频时,对多媒体文件进行周期性的抽帧处理,得到候选视频帧。
这里,当待编辑的多媒体文件为视频时,为了保证多媒体编辑的实时性,根据设定的抽帧频率对多媒体文件进行抽帧处理,得到多个候选视频帧。
在步骤502中,对候选视频帧进行素材识别处理,得到多媒体文件中的素材的类型。
这里,对候选视频帧进行素材识别处理,即进行多分类处理,得到多媒体文件中的素材的类型。
在一些实施例中,步骤502之前,还包括:当多个候选视频帧所占的存储空间大于存储空间阈值时,对多个候选视频帧进行压缩处理;可以通过这样的方式实现上述的对候选视频帧进行素材识别处理,得到多媒体文件中的素材的类型:对压缩后的多个候选视频帧分别进行素材识别处理,得到每个候选视频帧中的素材的类型,并将多个候选视频帧中占比最大的素材的类型,作为多媒体文件中的用于获取对应的素材编辑模式的素材的类型;其中,占比包括素材的面积占比及数量占比中的任意一种。
在抽帧得到多个候选视频帧后,若多个候选视频帧所占的存储空间大于设定的存储空间阈值,则为了保证实时性,可以对多个候选视频帧进行压缩处理,压缩处理时的设定尺寸可以根据实际应用场景进行具体设定。然后,对压缩后的多个候选视频帧分别进行素材识别处理,得到每个候选视频帧中的素材的类型,若多个候选视频帧中包括不同的类型,则可以筛选出一个类型,以作为获取对应的素材编辑模式的类型。
本申请实施例提供了筛选类型的两种方式,第一种方式是,筛选出多个候选视频帧中面积占比最大的素材的类型,例如抽帧处理得到的候选视频帧包括视频帧1、视频帧2和视频帧3,视频帧1中人脸和天空的面积占比各是80%和20%,视频帧2中人脸和天空的面积占比各是70%和30%,视频帧3中人脸和天空的面积占比各是75%和25%,则可以计算出人脸的平均面积占比为75%,天空的平均面积占比为25%,则将平均面积占比更大的人脸,作为筛选出的类型。该方式同样适用于多媒体文件为图片的情况。
第二种方式是,筛选出多个候选视频帧中数量占比最大的素材的类型,例如抽帧处理得到的候选视频帧包括视频帧1、视频帧2和视频帧3,视频帧1和视频帧2中均仅包括人脸,视频帧3中同时包括人脸和天空,则可得到人脸在多个候选视频帧中的数量占比为100%,天空的数量占比为1/3,故将数量占比更大的人脸作为筛选出的类型。通过上述方式对素材的类型进行了有效筛选。
在本申请实施例中,也可以不进行类型筛选,直接根据每个候选视频帧中的素材的类型获取对应的素材编辑模式及互动模板。若获取到的素材编辑模式或者互动模板的数量为多个,则再进行筛选。
在一些实施例中,步骤502之前,还包括:获取多个样本多媒体文件;通过人工智能模型对样本多媒体文件进行素材识别处理,得到样本多媒体文件中的素材的类型;根据素材识别处理得到的类型与实际类型之间的差异,更新人工智能模型的权重参数;其中,更新后的人工智能模型用于对候选视频帧进行素材识别处理。
在本申请实施例中,可以通过人工智能模型来实现素材识别处理,该人工智能模型为多分类模型。首先,对人工智能模型进行训练,例如,获取多个样本多媒体文件、以及每个样本多媒体文件中的素材的实际类型,并通过人工智能模型对样本多媒体文件进行素材识别处理,得到样本多媒体文件中的素材的类型。其中,若样本多媒体文件为图片,则通过人工智能模型直接对样本多媒体文件进行素材识别处理;若样本多媒体文件为视频,则通过人工智能模型对样本多媒体文件中的视频帧(如通过周期性的抽帧处理得到)进行素材识别处理。
然后,根据人工智能模型的损失函数,确定素材识别处理得到的类型与实际类型之间的差异,该差异即为损失值。根据确定出的差异在人工智能模型中进行反向传播,并在反向传播的过程中,沿梯度下降方向更新人工智能模型的权重参数。在完成对人工智能模型的权重参数的更新后,即可将更新后的人工智能模型用于步骤502,以对多媒体文件中的候选视频帧进行素材识别处理。此外,当待编辑的多媒体文件为图片时,同样可以根据更新后的人工智能模型进行素材识别处理。通过上述的模型训练的方式,能 够提升素材识别处理的精度。
在步骤503中,获取与多媒体文件中的素材的类型对应的素材编辑模式以及互动模板。
如图3E所示,本申请实施例在多媒体文件为视频时,对多媒体文件进行周期性的抽帧处理,并对抽出的视频帧进行素材识别处理,能够快速、准确地得到多媒体文件中的素材的类型。
下面,将说明本申请实施例在实际的应用场景中的示例性应用。本申请实施例提供的基于人工智能的多媒体处理方法可以由终端设备和服务器协同实现,例如,终端设备中安装有用于进行多媒体编辑的软件客户端,通过与服务器交互从而实现智能编辑。这里,可以将AI/AR能力(对应上文的素材编辑能力)及能力之上的创意玩法模板(对应上文的互动模板)进行组件化,根据用户在客户端中所选取的多媒体文件中的素材的类型,云端智能匹配合适的AI/AR能力,再将该AI/AR能力对应的创意玩法模板应用于多媒体文件中,实现对多媒体文件的智能编辑。
本申请实施例提供了如图4所示的创建创意玩法模板的示意图,在图4中,针对每个AI/AR能力,根据其对应的处理结果(对应上文的匹配程度)和匹配因子阈值(对应上文的匹配程度阈值),创建至少一个创意玩法模板,例如,针对人脸识别能力来说,其对应的处理结果为识别出的人脸的面积占比,匹配因子阈值为人脸的面积占比阈值,如为50%。在本申请实施例中,创意玩法模板支持组件化的效果,因此在创建创意玩法模板时,可以复用已有的效果,从而提升创建创意玩法模板的自由度和效率。另外,针对匹配程度小于或等于匹配因子阈值、或者AI/AR能力处理异常的情况,在创建创意玩法模板时,可以在其中多预埋一路异常处理效果,保证应用创意玩法模板时,创意表达的连贯性和完整性。为了便于说明,将根据AI/AR能力的处理结果进行应用的效果命名为AI/AR效果(对应上文的第一效果),将异常处理效果命名为非AI/AR效果(对应上文的第二效果),举例来说,AI效果可以为魔法挂件、人脸特效、用于替换背景的图片或音乐卡点组件等,AR效果可以为摆地摊效果或虚拟动画人物效果等。
本申请实施例提供了如图5所示的选择多媒体文件的示意图,在图5中,示出了存储于客户端(终端设备运行的客户端)本地的多个多媒体文件,例如示出的多媒体文件51,多媒体文件的形式包括但不限于视频和图片。用户可通过触发图5中的编辑选项52,来选择多媒体文件进行智能编辑,或者,也可通过触发拍摄选项53,从而进行实时拍摄得到待编辑的多媒体文件。
本申请实施例提供了如图6所示的应用创意玩法模板后的多媒体文件的示意图,在图6中,应用的AI/AR能力是背景识别能力,对应的创意玩法模板包括如背景612所示的AI/AR效果,在应用该创意玩法模板时,将多媒体文件61中除人体611之外的背景(如天空),替换为创意玩法模板中的背景612。在呈现替换背景后的多媒体文件61时,还可呈现手动编辑选项62,用户可通过触发手动编辑选项62,对多媒体文件61进行手动编辑,例如切换应用的创意玩法模板、添加或切换背景音乐、进行手动剪辑、添加额外的效果(如文字、贴纸或特效)等。用户在对多媒体文件61满意时,可通过触发完成选项63,来完成编辑,客户端可以将完成编辑的多媒体文件61存储至本地,或者发送至社交网络以进行分享,本申请实施例对此不做限定。
本申请实施例还提供了如图7所示的应用创意玩法模板后的多媒体文件的示意图,在图7中,多媒体文件71为实时拍摄得到的,其中的背景为工位,应用的AR能力是对工位进行识别的AR能力,应用的创意玩法模板包括效果711,如此,实现了在工位上摆地摊的增强现实的感官效果。此外,还呈现了对多媒体文件71的手动编辑选项72,具体包括翻转、滤镜及美颜等选项,还提供了添加魔法效果和添加音乐的选项。
本申请实施例还提供了如图8所示的确认提示的示意图,在图8中,用户在发布多媒体文件81,例如将多媒体文件81分享至社交网络时,客户端根据用于检测水印的AI/AR能力,识别出多媒体文件81中存在“@账号xx”的水印,并输出确认提示82以提醒用户。在此基础上,还可以将包括屏蔽水印的效果的创意玩法模板应用至多媒体文件81,以屏蔽多媒体文件81中的水印。在图6、图7及图8中,仅示出了集成单个AI/AR效果的创意玩法模板,但在实际应用场景中,一个创意玩法模板中也可包括多个AI/AR效果,本申请实施例对此不做限定。
本申请实施例对AI/AR能力的类型不做限定,如图9所示,示出了多种AI/AR能力,例如智能剪辑(图9中的能力91)、智能滤镜、风格化滤镜及人脸融合等。当然,这并不构成对本申请实施例的限定,根据实际应用场景的不同,可以应用更多的AI/AR能力,如性别转换及人脸颜值预测等。
对于智能编辑的底层实现,本申请实施例提供了如图10所示的流程示意图,为了便于理解,以三个阶段进行说明。
1)第一阶段:云端能力智能匹配。
客户端在获取到待编辑的多媒体文件后,若多媒体文件为图片,则直接将其发送至服务器(即云端);若多媒体文件为视频,则按照设定的抽帧频率进行抽帧处理(即图10中的内容抽取),将得到的多个候选视频帧发送至服务器进行能力匹配,其中,为了性能和质量的权衡,可以根据得到的多个候选视频帧的数据量(对应上文的所占的存储空间)大小,判断是否进行压缩处理。
服务器在接收到客户端发送的多个候选视频帧后,通过部署在云端的多个AI/AR能力进行并发匹配,根据匹配结果下发合适的AI/AR能力、以及对应的创意玩法模板至客户端。例如,这里可以通过人工智能模型识别出多个候选视频帧中素材的类型,若识别出人脸,则下发人脸识别的AI/AR能力以及对应的创意玩法模板;若识别出天空,则下发天空识别(一键换天)的AI/AR能力以及对应的创意玩法模板。值得说明的是,多个AI/AR能力以及每个AI/AR能力对应的创意玩法模板,可以同步部署在客户端本地,在该情况下,服务器下发AI/AR能力的标识(或编号)及创意玩法模板的标识(或编号)即可,如此,能够减少通信资源的消耗。另外,在未匹配到合适的AI/AR能力时,可以直接结束编辑。
2)第二阶段:应用场景差异化处理。这里,将多媒体文件为图片的场景确定为非实时场景,将多媒体文件为视频的场景确定为实时场景。
针对非实时场景,由于实时性没有要求或者要求较低,故可以根据AI/AR能力对原始的图片进行处理,以达到更高的能力处理准确度。此外,由于图片内容是静态不变的,即使业务上在图片中添加了时间属性或其他属性,也可以对应用的创意玩法模板进行复用。例如,若需要根据图片生成视频、且已在图片中应用了创意玩法模板,则持续应用该创意玩法模板,直至达到设定的持续时长(如10秒)即可。
针对实时场景,为了达到较高的实时性要求,可以对原始的视频进行压缩处理,即是将视频画面的同比例缩小至设定尺寸,该设定尺寸可以综合AI/AR能力自身的处理耗时和处理结果的准确度进行设置。
3)第三阶段:本地二次匹配及模板渲染。客户端在获取到服务器下发的AI/AR能力后,根据该AI/AR能力对多媒体文件进行识别处理,得到处理结果,其中,当多媒体文件为视频时,这里可以对多个候选视频帧进行识别处理。然后,将得到的处理结果与AI/AR能力对应的匹配因子阈值进行对比,并根据对比结果确定最终应用的创意玩法模板。其中,由于在实时场景下对视频帧进行了压缩处理,故对于同一AI/AR能力来说,在实时场景下设定的匹配因子阈值可以小于在非实时场景下设定的匹配因子阈值。
举例来说,多媒体文件为图片,服务器下发了天空识别的能力、以及与该能力对应的AI/AR模板和非AI/AR模板,其中,非AI/AR模板仅包括非AI/AR效果,AI/AR模板可以仅包括AI/AR效果,也可同时包括AI/AR效果和非AI/AR效果。以天空识别的能力对应的匹配因子阈值在非实时场景下为70%举例,客户端根据天空识别的能力对图片进行识别处理后,若得到图片中天空的面积占比大于匹配因子阈值,如面积占比为80%,则应用AI/AR模板中的AI/AR效果;若得到图片中天空的面积占比小于或等于匹配因子阈值,如面积占比为40%,则应用非AI/AR模板中的非AI/AR效果。
以上述例子为基础,并以多媒体文件为视频、且天空识别的能力对应的匹配因子阈值在实时场景下为60%进行举例。客户端根据天空识别的能力对视频中的多个候选视频帧进行识别处理后,若得到多个候选视频帧中天空的平均面积占比大于匹配因子阈值,如平均面积占比为80%,则应用AI/AR模板;若得到多个候选视频帧中天空的平均面积占比小于或等于匹配因子阈值,如平均面积占比为40%,则应用非AI/AR模板中的非AI/AR效果。
值得说明的是,服务器下发的、与天空识别的能力对应的AI/AR模板的数量可能为多个(多个AI/AR模板可以对应同一个匹配因子阈值),在该情况下,若确定出应用AI/AR模板,则可以默认应用下发的第一个AI/AR模板,或者,呈现多个AI/AR模板以供用户选择应用。在已应用某个AI/AR模板的基础上,可以呈现其他的AI/AR模板,以供用户切换。
若应用的AI/AR模板同时包括AI/AR效果和非AI/AR效果,则在应用该AI/AR模板的过程中,还可以进行实时判定,即针对视频中的每个视频帧,若视频帧中天空的面积占比小于或等于匹配因子阈值,则应用该AI/AR模板中的非AI/AR效果;若视频帧中天空的面积占比大于匹配因子阈值,则应用该AI/AR模板中的AI/AR效果。
在确定了最终应用的效果之后,可以通过模板渲染引擎,来渲染相应的效果。例如,最终应用的效果为用于替换视频帧中天空背景的星空图片,则通过模板渲染引擎渲染该效果,以将视频帧中的背景更新为星空图片。
通过本申请实施例,能够实现以下技术效果:1)将能力本身与能力之上的玩法进行分离,解决了设计侧在玩法创作上的局限性,提升创意玩法模板产出的自由度和效率;2)通过智能匹配,得到符合多媒体文件的能力,降低了选择多媒体文件的门槛,提升了编辑的成功率;3)能够覆盖实时场景和非实时场景,让用户能够体验更多的AI/AR能力之上的创意玩法;4)能力和玩法组件化,实现了效果的自由组合,极大地缩短创意玩法模板上线的周期,同时做到了创意玩法模板迭代和客户端版本迭代的分离。
下面继续说明本申请实施例提供的基于人工智能的多媒体处理装置455实施为软件模块的示例性结构,在一些实施例中,如图2所示,存储在存储器450的基于人工智能的多媒体处理装置455中的软件模块可以包括:获取模块4551,配置为响应于针对多媒体文件的编辑操作,获取与多媒体文件中的素材的类型对应的素材编辑模式以及互动模板;应用模块4552,配置为根据素材编辑模式在多媒体文件中识别素材,并对所识别的素材应用互动模板;应用完成模块4553,配置为呈现应用互动模板后的多媒体文件。
在一些实施例中,互动模板用于形成基于素材的互动效果;互动效果包括第一效果和区别于第一效果的第二效果,第一效果包括增强现实效果和虚拟现实效果中的至少一个;应用模块4552,还配置为当所识别的素材满足素材编辑模式对应的素材互动条件时,对所识别的素材应用互动模板中的第一效果;基于人工智能的多媒体处理装置455还包括:设定位置应用模块,配置为当所识别的素材未满足素材互动条件时,获取互动模板中的第二效果对应的设定位置,并在多媒体文件中的设定位置应用第二效果。
在一些实施例中,素材互动条件包括对应非实时场景的第一素材互动条件、以及 对应实时场景的第二素材互动条件;应用模块4552,还配置为:当编辑场景为非实时场景时,根据素材编辑模式在多媒体文件中识别素材,得到素材与多媒体文件的匹配程度,并当匹配程度大于第一素材互动条件中的匹配程度阈值时,确定所识别的素材满足第一素材互动条件;当编辑场景为实时场景时,对多媒体文件进行压缩处理,根据素材编辑模式在压缩后的多媒体文件中识别素材,得到素材与压缩后的多媒体文件的匹配程度,并当匹配程度大于第二素材互动条件中的匹配程度阈值时,确定所识别的素材满足第二素材互动条件;其中,第一素材互动条件中的匹配程度阈值大于第二素材互动条件中的匹配程度阈值。
在一些实施例中,基于人工智能的多媒体处理装置455还包括:模式筛选模块,配置为当素材编辑模式的数量为多个时,执行以下任意一种处理:针对每个素材编辑模式,根据素材编辑模式在多媒体文件中识别得到候选素材,并当候选素材满足素材编辑模式对应的素材互动条件时,将素材编辑模式作为待应用的素材编辑模式;呈现多个素材编辑模式的预览识别过程,并响应于针对素材编辑模式的选择操作,将被选择的素材编辑模式作为待应用的素材编辑模式;其中,待应用的素材编辑模式用于在多媒体文件中识别素材。
在一些实施例中,模式筛选模块还配置为:当多媒体文件为视频时,对多媒体文件进行周期性的抽帧处理,得到候选视频帧;根据素材编辑模式在候选视频帧中识别得到候选素材;应用模块4552,还配置为:根据待应用的素材编辑模式在多媒体文件的每个视频帧中识别素材。
在一些实施例中,基于人工智能的多媒体处理装置455还包括:模板筛选模块,配置为当互动模板的数量为多个时,执行以下任意一种处理:当所识别的素材满足任一互动模板对应的素材互动条件时,将满足的素材互动条件对应的互动模板作为待应用的互动模板;呈现多个互动模板的预览应用过程,并响应于针对互动模板的选择操作,将被选择的互动模板作为待应用的互动模板。
在一些实施例中,基于人工智能的多媒体处理装置455还包括:抽帧模块,配置为当多媒体文件为视频时,对多媒体文件进行周期性的抽帧处理,得到候选视频帧;素材识别模块,配置为对候选视频帧进行素材识别处理,得到多媒体文件中的素材的类型。
在一些实施例中,基于人工智能的多媒体处理装置455还包括:压缩模块,配置为当多个候选视频帧所占的存储空间大于存储空间阈值时,对多个候选视频帧进行压缩处理;素材识别模块,还配置为:对压缩后的多个候选视频帧分别进行素材识别处理,得到每个候选视频帧中的素材的类型,并将多个候选视频帧中占比最大的素材的类型,作为用于获取对应的素材编辑模式的素材的类型;其中,占比包括素材的面积占比及数量占比中的任意一种。
在一些实施例中,基于人工智能的多媒体处理装置455还包括:样本获取模块,配置为获取多个样本多媒体文件;样本识别模块,配置为通过人工智能模型对样本多媒体文件进行素材识别处理,得到样本多媒体文件中的素材的类型;更新模块,配置为根据素材识别处理得到的类型与实际类型之间的差异,更新人工智能模型的权重参数;其中,更新后的人工智能模型用于对候选视频帧进行素材识别处理。
在一些实施例中,获取模块4551,还配置为:在与多个类型分别对应的候选素材编辑模式中,获取与多媒体文件中的素材的类型对应的素材编辑模式,并获取与素材编辑模式对应的至少一个互动模板。
在一些实施例中,基于人工智能的多媒体处理装置455还包括:提示呈现模块,配置为呈现与所识别的素材对应的确认提示;其中,确认提示包括所识别的素材的类型、在多媒体文件中的位置信息、以及应用互动模板后得到的预览结果中的至少一种;应用 模块4552,还配置为:响应于针对确认提示的确认操作,对所识别的素材应用互动模板。
在一些实施例中,互动模板包括互动效果及互动效果的持续时长;应用模块4552,还配置为:对所识别的素材应用互动模板中的互动效果,并保持应用的互动效果直至达到持续时长。
本申请实施例提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令(可执行指令),该计算机指令存储在计算机可读存储介质中。电子设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该电子设备执行本申请实施例上述的基于人工智能的多媒体处理方法。
本申请实施例提供一种存储有可执行指令的计算机可读存储介质,其中存储有可执行指令,当可执行指令被处理器执行时,将引起处理器执行本申请实施例提供的方法,例如,如图3A、图3B、图3C、图3D或图3E示出的基于人工智能的多媒体处理方法。
在一些实施例中,计算机可读存储介质可以是FRAM、ROM、PROM、EPROM、EEPROM、闪存、磁表面存储器、光盘、或CD-ROM等存储器;也可以是包括上述存储器之一或任意组合的各种设备。
在一些实施例中,可执行指令可以采用程序、软件、软件模块、脚本或代码的形式,按任意形式的编程语言(包括编译或解释语言,或者声明性或过程性语言)来编写,并且其可按任意形式部署,包括被部署为独立的程序或者被部署为模块、组件、子例程或者适合在计算环境中使用的其它单元。
作为示例,可执行指令可以但不一定对应于文件系统中的文件,可以可被存储在保存其它程序或数据的文件的一部分,例如,存储在超文本标记语言(HTML,Hyper Text Markup Language)文档中的一个或多个脚本中,存储在专用于所讨论的程序的单个文件中,或者,存储在多个协同文件(例如,存储一个或多个模块、子程序或代码部分的文件)中。
作为示例,可执行指令可被部署为在一个电子设备上执行,或者在位于一个地点的多个电子设备上执行,又或者,在分布在多个地点且通过通信网络互连的多个电子设备上执行。
综上,通过本申请实施例能够实现以下技术效果:
1)针对待编辑的多媒体文件,获取与该多媒体文件中的素材的类型对应的素材编辑模式以及互动模板,能够提升多媒体编辑的成功率,并且提升计算资源的利用率。
2)在互动模板包括第一效果及第二效果时,根据所识别的素材来应用相应的效果,提升了对不同识别情况的适用性;在多媒体文件为视频时,能够保证应用效果的连贯性和完整性。
3)当编辑场景为实时场景时,对多媒体文件进行压缩处理,从而能够提升后续处理的效率,满足实时性要求;同时,设定实时场景中的匹配程度阈值小于非实时场景中的匹配程度阈值,以符合压缩处理的特点。
4)在获取到的素材编辑模式或互动模板的数量为多个时,提供了智能选取和人机交互选取的两种筛选方式,提升了筛选的灵活性和准确性。
5)在多媒体文件为视频时,对多媒体文件进行周期性的抽帧处理,并对抽出的视频帧进行素材识别处理,能够快速地得到多媒体文件中所包括素材的类型,满足实时性要求。
6)将素材编辑模式与互动模板进行分离,解决了在创建互动模板时的局限性,提升创建互动模板的自由度和效率;在创建互动模板时可以对组件化的效果进行自由组合,极大地缩短互动模板上线的周期,同时实现了互动模板迭代和客户端版本迭代的分离,即无论客户端是什么版本,只需在客户端本地部署互动模板,即可实现多媒体文件的智 能编辑。
以上,仅为本申请的实施例而已,并非用于限定本申请的保护范围。凡在本申请的精神和范围之内所作的任何修改、等同替换和改进等,均包含在本申请的保护范围之内。
Claims (20)
- 一种基于人工智能的多媒体处理方法,由电子设备执行,所述方法包括:响应于针对多媒体文件的编辑操作,获取与所述多媒体文件中的素材的类型对应的素材编辑模式以及互动模板;根据所述素材编辑模式在所述多媒体文件中识别所述素材,并对所识别的所述素材应用所述互动模板;呈现应用所述互动模板后的所述多媒体文件。
- 根据权利要求1所述的多媒体处理方法,其中,所述互动模板用于形成基于所述素材的互动效果;所述互动效果包括第一效果和区别于所述第一效果的第二效果,所述第一效果包括增强现实效果和虚拟现实效果中的至少一个;所述对所识别的所述素材应用所述互动模板,包括:当所识别的所述素材满足所述素材编辑模式对应的素材互动条件时,对所识别的所述素材应用所述互动模板中的第一效果;所述多媒体处理方法还包括:当所识别的所述素材未满足所述素材互动条件时,获取所述互动模板中的所述第二效果对应的设定位置,并在所述多媒体文件中的所述设定位置应用所述第二效果。
- 根据权利要求2所述的多媒体处理方法,其中,所述素材互动条件包括对应非实时场景的第一素材互动条件、以及对应实时场景的第二素材互动条件;所述根据所述素材编辑模式在所述多媒体文件中识别所述素材,包括:当编辑场景为所述非实时场景时,根据所述素材编辑模式在所述多媒体文件中识别所述素材,得到所述素材与所述多媒体文件的匹配程度,并当所述匹配程度大于所述第一素材互动条件中的匹配程度阈值时,确定所识别的所述素材满足所述第一素材互动条件;当编辑场景为所述实时场景时,对所述多媒体文件进行压缩处理,根据所述素材编辑模式在压缩后的所述多媒体文件中识别所述素材,得到所述素材与压缩后的所述多媒体文件的匹配程度,并当所述匹配程度大于所述第二素材互动条件中的匹配程度阈值时,确定所识别的所述素材满足所述第二素材互动条件;其中,所述第一素材互动条件中的匹配程度阈值大于所述第二素材互动条件中的匹配程度阈值。
- 根据权利要求1所述的多媒体处理方法,其中,还包括:当所述素材编辑模式的数量为多个时,执行以下任意一种处理:针对每个所述素材编辑模式,根据所述素材编辑模式在所述多媒体文件中识别得到候选素材,并当所述候选素材满足所述素材编辑模式对应的素材互动条件时,将所述素材编辑模式作为待应用的素材编辑模式;呈现多个所述素材编辑模式的预览识别过程,并响应于针对素材编辑模式的选择操作,将被选择的素材编辑模式作为待应用的素材编辑模式;其中,所述待应用的素材编辑模式用于在所述多媒体文件中识别所述素材。
- 根据权利要求4所述的多媒体处理方法,其中,所述根据所述素材编辑模式在所述多媒体文件中识别得到候选素材,包括:当所述多媒体文件为视频时,对所述多媒体文件进行周期性的抽帧处理,得到候选视频帧;根据所述素材编辑模式在所述候选视频帧中识别得到候选素材;所述根据所述素材编辑模式在所述多媒体文件中识别所述素材,包括:根据所述待应用的素材编辑模式在所述多媒体文件的每个视频帧中识别所述素材。
- 根据权利要求5所述的多媒体处理方法,其中,所述根据所述素材编辑模式在所述候选视频帧中识别得到候选素材之前,还包括:当多个所述候选视频帧所占的存储空间大于存储空间阈值时,对多个所述候选视频帧进行压缩处理;所述根据所述素材编辑模式在所述候选视频帧中识别得到候选素材,包括:根据所述素材编辑模式在压缩后的所述候选视频帧中识别得到候选素材。
- 根据权利要求4所述的多媒体处理方法,其中,所述呈现多个所述素材编辑模式的预览识别过程,包括:针对每个所述素材编辑模式,执行以下至少一种处理:呈现根据所述素材编辑模式对所述多媒体文件的预览识别过程;呈现根据所述素材编辑模式对样本多媒体文件的预览识别过程。
- 根据权利要求1所述的多媒体处理方法,其中,还包括:当所述互动模板的数量为多个时,执行以下任意一种处理:当所识别的所述素材满足任一所述互动模板对应的素材互动条件时,将满足的素材互动条件对应的互动模板作为待应用的互动模板;呈现多个所述互动模板的预览应用过程,并响应于针对互动模板的选择操作,将被选择的互动模板作为待应用的互动模板。
- 根据权利要求8所述的多媒体处理方法,其中,所述呈现多个所述互动模板的预览应用过程,包括:针对每个所述互动模板,执行以下至少一种处理:呈现根据所述互动模板对所述多媒体文件的预览应用过程;呈现根据所述互动模板对样本多媒体文件的预览应用过程。
- 根据权利要求1至9任一项所述的多媒体处理方法,其中,还包括:当所述多媒体文件为视频时,对所述多媒体文件进行周期性的抽帧处理,得到候选视频帧;对所述候选视频帧进行素材识别处理,得到所述多媒体文件中的素材的类型。
- 根据权利要求10所述的多媒体处理方法,其中,所述对所述候选视频帧进行素材识别处理之前,还包括:当多个所述候选视频帧所占的存储空间大于存储空间阈值时,对多个所述候选视频帧进行压缩处理;所述对所述候选视频帧进行素材识别处理,得到所述多媒体文件中的素材的类型,包括:对压缩后的多个所述候选视频帧分别进行素材识别处理,得到每个所述候选视频帧中的素材的类型,并将多个所述候选视频帧中占比最大的素材的类型,作为用于获取对应的素材编辑模式的素材的类型;其中,所述占比包括素材的面积占比及数量占比中的任意一种。
- 根据权利要求10所述的多媒体处理方法,其中,还包括:获取多个样本多媒体文件;通过人工智能模型对所述样本多媒体文件进行素材识别处理,得到所述样本多媒体文件中的素材的类型;根据所述素材识别处理得到的类型与实际类型之间的差异,更新所述人工智能模型的权重参数;其中,更新后的所述人工智能模型用于对所述候选视频帧进行素材识别处理。
- 根据权利要求1至9任一项所述的多媒体处理方法,其中,所述获取与所述多媒体文件中的素材的类型对应的素材编辑模式以及互动模板,包括:在与多个类型分别对应的候选素材编辑模式中,获取与所述多媒体文件中的素材的类型对应的素材编辑模式,并获取与所述素材编辑模式对应的至少一个互动模板。
- 根据权利要求1至9任一项所述的多媒体处理方法,其中,所述根据所述素材编辑模式在所述多媒体文件中识别所述素材之后,还包括:呈现与所识别的所述素材对应的确认提示;其中,所述确认提示包括所识别的所述素材的类型、在所述多媒体文件中的位置信息、以及应用所述互动模板后得到的预览结果中的至少一种;所述对所识别的所述素材应用所述互动模板,包括:响应于针对所述确认提示的确认操作,对所识别的所述素材应用所述互动模板。
- 根据权利要求1至9任一项所述的多媒体处理方法,其中,所述互动模板包括互动效果及所述互动效果的持续时长;所述对所识别的所述素材应用所述互动模板,包括:对所识别的所述素材应用所述互动模板中的互动效果,并保持应用的所述互动效果直至达到所述持续时长。
- 根据权利要求1至9任一项所述的多媒体处理方法,其中,所述根据所述素材编辑模式在所述多媒体文件中识别所述素材之后,还包括:根据所识别的所述素材在所述多媒体文件中的位置信息,对所述多媒体文件进行裁剪处理;所述对所识别的所述素材应用所述互动模板,包括:对裁剪后的所述多媒体文件中所识别的所述素材应用所述互动模板。
- 根据权利要求16所述的多媒体处理方法,其中,所述根据所识别的所述素材在所述多媒体文件中的位置信息,对所述多媒体文件进行裁剪处理,包括:当所述位置信息为时间位置信息时,根据所述时间位置信息中的第一个时间戳和最后一个时间戳,对所述多媒体文件进行针对时间的裁剪处理;当所述位置信息为区域位置信息时,根据所述区域位置信息对所述多媒体文件进行针对区域的裁剪处理。
- 一种基于人工智能的多媒体处理装置,包括:获取模块,配置为响应于针对多媒体文件的编辑操作,获取与所述多媒体文件中的素材的类型对应的素材编辑模式以及互动模板;应用模块,配置为根据所述素材编辑模式在所述多媒体文件中识别所述素材,并对所识别的所述素材应用所述互动模板;应用完成模块,配置为呈现应用所述互动模板后的所述多媒体文件。
- 一种电子设备,包括:存储器,用于存储可执行指令;处理器,用于执行所述存储器中存储的可执行指令时,实现权利要求1至17任一项所述的基于人工智能的多媒体处理方法。
- 一种计算机可读存储介质,存储有可执行指令,用于被处理器执行时,实现权利要求1至17任一项所述的基于人工智能的多媒体处理方法。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/977,590 US20230057566A1 (en) | 2020-08-19 | 2022-10-31 | Multimedia processing method and apparatus based on artificial intelligence, and electronic device |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010837810.8A CN111914523B (zh) | 2020-08-19 | 2020-08-19 | 基于人工智能的多媒体处理方法、装置及电子设备 |
CN202010837810.8 | 2020-08-19 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/977,590 Continuation US20230057566A1 (en) | 2020-08-19 | 2022-10-31 | Multimedia processing method and apparatus based on artificial intelligence, and electronic device |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022037260A1 true WO2022037260A1 (zh) | 2022-02-24 |
Family
ID=73279395
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/102803 WO2022037260A1 (zh) | 2020-08-19 | 2021-06-28 | 基于人工智能的多媒体处理方法、装置及电子设备 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20230057566A1 (zh) |
CN (1) | CN111914523B (zh) |
WO (1) | WO2022037260A1 (zh) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111914523B (zh) * | 2020-08-19 | 2021-12-14 | 腾讯科技(深圳)有限公司 | 基于人工智能的多媒体处理方法、装置及电子设备 |
CN112468741A (zh) * | 2020-11-13 | 2021-03-09 | 咪咕文化科技有限公司 | 视频生成的方法、电子设备及存储介质 |
CN112584061B (zh) * | 2020-12-24 | 2023-08-01 | 咪咕文化科技有限公司 | 多媒体通用模板生成方法、电子设备及存储介质 |
CN115269889B (zh) * | 2021-04-30 | 2024-07-02 | 北京字跳网络技术有限公司 | 剪辑模板搜索方法及装置 |
CN113794930B (zh) * | 2021-09-10 | 2023-11-24 | 中国联合网络通信集团有限公司 | 视频生成方法、装置、设备及存储介质 |
CN115988276B (zh) * | 2021-10-15 | 2024-06-25 | 腾讯科技(深圳)有限公司 | 多媒体信息编辑模板处理方法及装置 |
CN114782284B (zh) * | 2022-06-17 | 2022-09-23 | 广州三七极耀网络科技有限公司 | 动作数据修正方法、装置、设备及存储介质 |
CN118509641A (zh) * | 2023-02-16 | 2024-08-16 | 北京字跳网络技术有限公司 | 视频剪辑方法、装置、设备及介质 |
CN117036203B (zh) * | 2023-10-08 | 2024-01-26 | 杭州黑岩网络科技有限公司 | 一种智能绘图方法及系统 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103679800A (zh) * | 2013-11-21 | 2014-03-26 | 北京航空航天大学 | 一种视频图像虚拟场景生成系统及其框架构造方法 |
CN104571516A (zh) * | 2014-12-31 | 2015-04-29 | 武汉百景互动科技有限责任公司 | 互动广告系统 |
CN105931096A (zh) * | 2016-04-14 | 2016-09-07 | 杭州艺豆网络科技有限公司 | 一种商品详情页面的制作方法 |
CN111914523A (zh) * | 2020-08-19 | 2020-11-10 | 腾讯科技(深圳)有限公司 | 基于人工智能的多媒体处理方法、装置及电子设备 |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105323634B (zh) * | 2014-06-27 | 2019-01-04 | Tcl集团股份有限公司 | 一种视频的缩略图生成方法及系统 |
CN106028217B (zh) * | 2016-06-20 | 2020-01-21 | 咻羞科技(深圳)有限公司 | 一种基于音频识别技术的智能设备互动系统及方法 |
CN106210808B (zh) * | 2016-08-08 | 2019-04-16 | 腾讯科技(深圳)有限公司 | 媒体信息投放方法、终端、服务器及系统 |
US10261747B2 (en) * | 2016-09-09 | 2019-04-16 | The Boeing Company | Synchronized side-by-side display of live video and corresponding virtual environment images |
CN109309802A (zh) * | 2017-07-27 | 2019-02-05 | 中兴通讯股份有限公司 | 视频互动的管理方法、服务器及计算机可读存储介质 |
CN108010037B (zh) * | 2017-11-29 | 2019-09-13 | 腾讯科技(深圳)有限公司 | 图像处理方法、装置及存储介质 |
CN112911182B (zh) * | 2018-06-28 | 2022-08-23 | 腾讯科技(深圳)有限公司 | 游戏互动方法、装置、终端及存储介质 |
CN109191370A (zh) * | 2018-08-06 | 2019-01-11 | 光锐恒宇(北京)科技有限公司 | 图像处理方法、装置、智能终端和计算机可读存储介质 |
CN109492577B (zh) * | 2018-11-08 | 2020-09-18 | 北京奇艺世纪科技有限公司 | 一种手势识别方法、装置及电子设备 |
CN109474850B (zh) * | 2018-11-29 | 2021-07-20 | 北京字节跳动网络技术有限公司 | 运动像素视频特效添加方法、装置、终端设备及存储介质 |
CN110784752B (zh) * | 2019-09-27 | 2022-01-11 | 腾讯科技(深圳)有限公司 | 一种视频互动方法、装置、计算机设备和存储介质 |
CN110856039A (zh) * | 2019-12-02 | 2020-02-28 | 新华智云科技有限公司 | 视频处理方法及装置、存储介质 |
CN111178343A (zh) * | 2020-04-13 | 2020-05-19 | 腾讯科技(深圳)有限公司 | 基于人工智能的多媒体资源检测方法、装置、设备及介质 |
CN111556363B (zh) * | 2020-05-21 | 2021-09-28 | 腾讯科技(深圳)有限公司 | 视频特效处理方法、装置、设备及计算机可读存储介质 |
-
2020
- 2020-08-19 CN CN202010837810.8A patent/CN111914523B/zh active Active
-
2021
- 2021-06-28 WO PCT/CN2021/102803 patent/WO2022037260A1/zh active Application Filing
-
2022
- 2022-10-31 US US17/977,590 patent/US20230057566A1/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103679800A (zh) * | 2013-11-21 | 2014-03-26 | 北京航空航天大学 | 一种视频图像虚拟场景生成系统及其框架构造方法 |
CN104571516A (zh) * | 2014-12-31 | 2015-04-29 | 武汉百景互动科技有限责任公司 | 互动广告系统 |
CN105931096A (zh) * | 2016-04-14 | 2016-09-07 | 杭州艺豆网络科技有限公司 | 一种商品详情页面的制作方法 |
CN111914523A (zh) * | 2020-08-19 | 2020-11-10 | 腾讯科技(深圳)有限公司 | 基于人工智能的多媒体处理方法、装置及电子设备 |
Non-Patent Citations (1)
Title |
---|
WANG KUN: "The Design and Implementation of Generic Object Detection Module in Magic Camera", THESIS, (I) ELECTRONIC TECHNOLOGY & INFORMATION SCIENCE, no. 8, 15 August 2018 (2018-08-15), XP055901536 * |
Also Published As
Publication number | Publication date |
---|---|
US20230057566A1 (en) | 2023-02-23 |
CN111914523A (zh) | 2020-11-10 |
CN111914523B (zh) | 2021-12-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022037260A1 (zh) | 基于人工智能的多媒体处理方法、装置及电子设备 | |
CN108010112B (zh) | 动画处理方法、装置及存储介质 | |
KR102290419B1 (ko) | 디지털 컨텐츠의 시각적 내용 분석을 통해 포토 스토리를 생성하는 방법 및 장치 | |
CN113453040B (zh) | 短视频的生成方法、装置、相关设备及介质 | |
CN111930994A (zh) | 视频编辑的处理方法、装置、电子设备及存储介质 | |
US10762678B2 (en) | Representing an immersive content feed using extended reality based on relevancy | |
CN110832583A (zh) | 用于从多个图像帧生成概要故事板的系统和方法 | |
US12039688B2 (en) | Augmented reality beauty product tutorials | |
CN112188228A (zh) | 直播方法及装置、计算机可读存储介质和电子设备 | |
CN111862280A (zh) | 虚拟角色控制方法、系统、介质及电子设备 | |
WO2023040151A1 (zh) | 算法应用元生成方法、装置、电子设备、计算机可读存储介质及计算机程序产品 | |
US11836847B2 (en) | Systems and methods for creating and displaying interactive 3D representations of real objects | |
CN114286154A (zh) | 多媒体文件的字幕处理方法、装置、电子设备及存储介质 | |
US20230298239A1 (en) | Data processing method based on augmented reality | |
CN113222841A (zh) | 一种图像处理方法、装置、设备及介质 | |
CN113709575B (zh) | 视频编辑处理方法、装置、电子设备及存储介质 | |
KR102040392B1 (ko) | 클라우드 기반의 증강현실 콘텐츠 서비스 제공 방법 | |
CN112579144A (zh) | 数据处理方法及装置 | |
CN114344920A (zh) | 基于虚拟场景的数据录制方法、装置、设备及存储介质 | |
KR20190094879A (ko) | 실외 증강현실 서비스를 위한 모듈식 콘텐츠 제작 방법 및 장치 | |
CN112449249A (zh) | 视频流处理方法及装置、电子设备及存储介质 | |
CN116228942B (zh) | 角色动作提取方法、设备和存储介质 | |
CN115499672B (zh) | 图像显示方法、装置、设备及存储介质 | |
CN113485619B (zh) | 信息收集表的处理方法、装置、电子设备及存储介质 | |
CN118079402A (zh) | 游戏场景组件编辑方法、装置、程序产品与电子设备 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21857356 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 290623) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21857356 Country of ref document: EP Kind code of ref document: A1 |