WO2020038167A1 - Video image recognition method and apparatus, terminal and storage medium - Google Patents

Video image recognition method and apparatus, terminal and storage medium Download PDF

Info

Publication number
WO2020038167A1
WO2020038167A1 PCT/CN2019/096578 CN2019096578W WO2020038167A1 WO 2020038167 A1 WO2020038167 A1 WO 2020038167A1 CN 2019096578 W CN2019096578 W CN 2019096578W WO 2020038167 A1 WO2020038167 A1 WO 2020038167A1
Authority
WO
WIPO (PCT)
Prior art keywords
image recognition
image
target
recognition result
identified
Prior art date
Application number
PCT/CN2019/096578
Other languages
French (fr)
Chinese (zh)
Inventor
宋方
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2020038167A1 publication Critical patent/WO2020038167A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content

Definitions

  • the embodiments of the present application relate to the technical field of terminals, and in particular, to a video image recognition method, device, terminal, and storage medium.
  • the terminal In related technology, if the user wants to know the characters or objects in the video, the terminal is usually triggered to take a screenshot of the current playback interface and save the screenshot, and then the terminal is triggered to exit the playback application and start running the search application. The user takes the above screenshot Upload to the search application and click the search control. At this time, the terminal obtains the relevant information about the person or item from the network and displays it to the user.
  • the embodiments of the present application provide a video image recognition method, device, terminal, and storage medium.
  • the technical solution is as follows:
  • an embodiment of the present application provides a video image recognition method, where the method includes:
  • the picture recognition control is displayed in the sidebar
  • the image recognition result is displayed.
  • an embodiment of the present application provides a video image recognition apparatus, where the apparatus includes:
  • a control display module is used to display the image recognition control in the sidebar when the video is playing.
  • An image acquisition module configured to, when receiving a first trigger signal corresponding to the image recognition function control, perform screenshot processing on a current playback screen to obtain a target image
  • An image recognition module configured to perform image recognition on the target image to obtain an image recognition result of the target image
  • a result display module configured to display the image recognition result.
  • an embodiment of the present application provides a terminal.
  • the terminal includes a processor and a memory.
  • the memory stores a computer program, and the computer program is loaded and executed by the processor to implement the method described in the foregoing aspect.
  • Video recognition method is another aspect.
  • an embodiment of the present application provides a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program, and the computer program is loaded by a processor and executes the video image recognition according to the foregoing aspect. method.
  • FIG. 1 is a schematic structural diagram of a terminal provided by an exemplary embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of a terminal according to another exemplary embodiment of the present application.
  • 3A to 3F are schematic diagrams of appearances of terminals with different touch display screens provided by exemplary embodiments of the present application.
  • FIG. 4 is a flowchart of a video image recognition method according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of an interface for displaying a video image recognition control provided by an embodiment of the present application
  • FIG. 6 is a schematic diagram of an interface for determining a target to-be-identified object according to an embodiment of the present application
  • FIG. 7 is a schematic diagram of an interface for determining a target to-be-recognized object according to another embodiment of the present application.
  • FIG. 8 is a schematic diagram of an interface for determining a target to-be-recognized object according to another embodiment of the present application.
  • FIG. 9 is a schematic interface diagram of a video image recognition method provided by an embodiment of the present application.
  • FIG. 10 is a schematic interface diagram of a video image recognition method provided by an embodiment of the present application.
  • FIG. 11 is a schematic diagram of an interface where video image recognition fails according to an embodiment of the present application.
  • FIG. 12 is a schematic diagram of an interface where video image recognition fails according to an embodiment of the present application.
  • FIG. 13 is a block diagram of a video image recognition apparatus according to an embodiment of the present application.
  • FIG. 1 and FIG. 2 are structural block diagrams of a terminal 100 according to an exemplary embodiment of the present application.
  • the terminal 100 may be a mobile phone, a tablet computer, a notebook computer, an e-book, or the like.
  • the terminal 100 in the present application may include one or more of the following components: a processor 110, a memory 120, and a touch display screen 130.
  • the processor 110 may include one or more processing cores.
  • the processor 110 uses various interfaces and lines to connect various parts in the entire terminal 100, and executes the terminal by running or executing instructions, programs, code sets or instruction sets stored in the memory 120, and calling data stored in the memory 120. 100 various functions and processing data.
  • the processor 110 may use at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA).
  • DSP Digital Signal Processing
  • FPGA Field-Programmable Gate Array
  • PDA Programmable Logic Array
  • the processor 110 may integrate one or a combination of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem.
  • CPU Central Processing Unit
  • GPU Graphics Processing Unit
  • modem modem
  • the CPU mainly handles the operating system, user interface, and application programs; the GPU is responsible for rendering and rendering of the content required to be touched by the display screen 130; and the modem is used for processing wireless communication. It can be understood that the modem may not be integrated into the processor 110, and may be implemented by a single chip.
  • the memory 120 may include Random Access Memory (RAM), and may also include Read-Only Memory.
  • the memory 120 includes a non-transitory computer-readable storage medium.
  • the memory 120 may be used to store instructions, programs, codes, code sets, or instruction sets.
  • the memory 120 may include a storage program area and a storage data area, where the storage program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playback function, an image playback function, etc.), Instructions for implementing the following method embodiments, etc .; the storage data area may store data (such as audio data, phone book) and the like created according to the use of the terminal 100.
  • the memory 120 stores a Linux kernel layer 220, a system runtime layer 240, an application framework layer 260, and an application layer 280.
  • the Linux kernel layer 220 provides low-level drivers for various hardware of the terminal 100, such as a display driver, an audio driver, a camera driver, a Bluetooth driver, a Wi-Fi driver, and power management.
  • the system runtime layer 240 provides main feature support for the Android system through some C / C ++ libraries. For example, the SQLite library provides database support, the OpenGL / ES library provides 3D graphics support, and the Webkit library provides browser kernel support.
  • An Android runtime library 242 (Android Runtime) is also provided in the system runtime layer 240, which mainly provides some core libraries, which can allow developers to write Android applications using the Java language.
  • the application framework layer 260 provides various APIs that may be used when building applications. Developers can also use these APIs to build their own applications, such as activity management, window management, view management, notification management, content providers, Package management, call management, resource management, positioning management.
  • At least one application program is running in the application layer 280, and these application programs may be contact programs, SMS programs, clock programs, camera applications, etc. that are native to the operating system; or applications developed by third-party developers, such as instant Communication programs, photo beautification programs, etc.
  • the IOS system includes: a core operating system layer 320 (Core OS layer), a core service layer 340 (Core services layer), and a media layer 360 (Media layer), 380 (Cocoa Touch Layer).
  • the core operating system layer 320 includes an operating system kernel, drivers, and a low-level program framework. These low-level program frameworks provide functions closer to the hardware for use by the program framework located in the core service layer 340.
  • the core service layer 340 provides system services and / or program frameworks required by applications, such as a Foundation framework, an account framework, an advertising framework, a data storage framework, a network connection framework, a geographic location framework, a motion framework, and so on.
  • the media layer 360 provides audio-visual-related interfaces for applications, such as interfaces related to graphics and images, interfaces related to audio technology, interfaces related to video technology, and wireless playback (AirPlay) interfaces for audio and video transmission technologies.
  • the touchable layer 380 provides various commonly-used interface-related frameworks for application development. The touchable layer 380 is responsible for user touch interaction operations on the terminal 100. For example, a local notification service, a remote push service, an advertising framework, a game tool framework, a message user interface interface (UI) framework, a user interface UIKit framework, a map framework, and so on.
  • UI message user interface interface
  • frameworks related to most applications include, but are not limited to, a basic framework in a core service layer 340 and a UIKit framework in a touchable layer 380.
  • the basic framework provides many basic object classes and data types, and provides the most basic system services for all applications, regardless of the UI.
  • the classes provided by the UIKit framework are basic UI class libraries for creating touch-based user interfaces.
  • IOS applications can provide UIs based on the UIKit framework, so it provides the application's infrastructure for building user interfaces and drawing. , Handle events with user interaction, respond to gestures, and more.
  • the touch display screen 130 is used for receiving a user's touch operation on or near any suitable object using a finger, a touch pen, or the like, and displaying a user interface of each application program.
  • the touch display screen 130 is generally disposed on a front panel of the terminal 130.
  • the touch display screen 130 may be designed as a full screen, a curved screen or a special-shaped screen.
  • the touch display screen 130 can also be designed as a combination of a full screen and a curved screen, and a combination of a special-shaped screen and a curved screen, which is not limited in this embodiment. among them:
  • the full screen may refer to a screen design in which the touch screen display 130 occupies the front panel of the terminal 100 with a screen ratio exceeding a threshold (such as 80% or 90% or 95%).
  • One calculation method of the screen ratio is: (the area of the touch display screen 130 / the area of the front panel of the terminal 100) * 100%; another calculation method of the screen ratio is: (the actual display area in the touch display screen 130) Area / the area of the front panel of the terminal 100) * 100%; another way to calculate the screen ratio is: (the diagonal of the touch screen 130 / the diagonal of the front panel of the terminal 100) * 100% .
  • a threshold such as 80% or 90% or 95%).
  • One calculation method of the screen ratio is: (the area of the touch display screen 130 / the area of the front panel of the terminal 100) * 100%; another calculation method of the screen ratio is: (the actual display area in the touch display screen 130) Area / the area of the front panel of the terminal 100) * 100%; another way to calculate the screen ratio is: (the
  • the full screen may also be a screen design in which at least one front panel component is integrated inside or below the touch display screen 130.
  • the at least one front panel component includes a camera, a fingerprint sensor, a proximity light sensor, a distance sensor, and the like.
  • other components on the front panel of the conventional terminal are integrated in all or part of the touch display screen 130. For example, after the photosensitive element in the camera is divided into multiple photosensitive pixels, each photosensitive The pixels are integrated in a black area in each display pixel in the touch display screen 130. Since at least one front panel component is integrated inside the touch display screen 130, the full screen has a higher screen ratio.
  • the front panel components on the front panel of the traditional terminal can also be set on the side or back of the terminal 100.
  • an ultrasonic fingerprint sensor is set under the touch display screen 130, and a bone conduction type
  • the handset is disposed inside the terminal 130, and the camera is disposed on the side of the terminal and is pluggable.
  • a single side of the middle frame of the terminal 100 or two sides (such as left and right sides), or four sides (such as (Upper, lower, left, and right sides) are provided with edge touch sensors 120, which are used to detect the user's touch operations, click operations, press operations, and slide operations on the middle frame. At least one operation.
  • the edge touch sensor 120 may be any one of a touch sensor, a thermal sensor, and a pressure sensor. The user can apply an operation on the edge touch sensor 120 to control an application program in the terminal 100.
  • the curved screen refers to a screen design in which the cross-section of the touch display screen 130 has a curved shape and the projection is a plane in a direction parallel to the cross-section.
  • the curved shape may be U-shaped.
  • the curved screen refers to a screen design manner in which at least one side is a curved shape.
  • the curved screen refers to that at least one side of the touch display screen 130 extends to cover the middle frame of the terminal 100. Since the side of the touch display screen 130 extends to the middle frame of the terminal 100, the middle frame without the display function and the touch function is also covered as a displayable area and / or an operable area, so that the curved screen has a more High screen ratio.
  • the curved screen refers to a screen design in which the left and right sides 42 are curved; or the curved screen refers to a screen design in which the upper and lower sides are curved; or, Curved screen refers to a screen design with four curved sides on the top, bottom, left, and right.
  • the curved screen is made of a touch screen material with a certain flexibility.
  • the special-shaped screen is a touch display screen with an irregular appearance.
  • the irregular shape is not a rectangle or a rounded rectangle.
  • the special-shaped screen refers to a screen design provided with protrusions, notches, and / or holes on the rectangular or rounded rectangular touch display screen 130.
  • the protrusion, the notch and / or the hole can be located at the edge of the touch display screen 130, the center of the screen, or both.
  • the protrusion, notch and / or hole When the protrusion, notch and / or hole is set on one edge, it can be set at the middle position or both ends of the edge; when the protrusion, notch and / or hole is set on the center of the screen, it can be set above the screen One or more of the region, the upper left region, the left region, the lower left region, the lower region, the lower right region, the right region, and the upper right region.
  • the protrusions, notches, and digging holes can be distributed in a centralized or distributed manner; they can be distributed symmetrically or asymmetrically.
  • the number of the protrusions, notches and / or holes is not limited.
  • the special-shaped screen covers the upper and / or lower forehead area of the touch display screen as a displayable area and / or an operable area, so that the touch-screen display takes up more space on the front panel of the terminal. Larger screen ratio.
  • the notches and / or holes are used to receive at least one front panel component, which includes a camera, a fingerprint sensor, a proximity light sensor, a distance sensor, a handset, an ambient light sensor, and a physical button. At least one.
  • the notch may be provided on one or more edges, and the notch may be a semicircular notch, a right-angled rectangular notch, a rounded rectangular notch, or an irregularly shaped notch.
  • the special-shaped screen may be a screen design provided with a semi-circular notch 43 at the center of the upper edge of the touch display screen 130, and the space vacated by the semi-circular notch 43 is used.
  • the special-shaped screen may be on the lower edge of the touch display screen 130
  • the screen design of the semi-circular notch 44 is set at the center position, and the space vacated by the semi-circular notch 44 is used to accommodate at least one component of a physical button, a fingerprint sensor, and a microphone; as shown schematically in FIG. 3E
  • the special-shaped screen may be a screen design provided with a semi-elliptical notch 45 at the center of the lower edge of the touch display screen 130, and a semi-elliptical notch is formed on the front panel of the terminal 100.
  • the notch encloses an elliptical area, which is used to accommodate physical keys or fingerprint recognition modules.
  • the special-shaped screen can be touched.
  • the upper half of the display screen 130 is provided with a screen design of at least one small hole 46.
  • the space vacated by the small hole 46 is used to accommodate at least one of the front panel of the camera, distance sensor, handset, and ambient light sensor. component.
  • the structure of the terminal 100 shown in the above drawings does not constitute a limitation on the terminal 100.
  • the terminal may include more or fewer components than shown, or a combination of some Components, or different component arrangements.
  • the terminal 100 further includes components such as a radio frequency circuit, an input unit, a sensor, an audio circuit, a wireless fidelity (WiFi) module, a power source, and a Bluetooth module, and details are not described herein again.
  • WiFi wireless fidelity
  • embodiments of the present application provide a method, a device, a terminal, and a storage medium for video image recognition.
  • the terminal displays a video recognition control in a video playback scene. If the user desires to know a certain character or item in the current playback screen, directly click the video recognition control, and then the terminal displays the current playback The screen performs image recognition and displays the image recognition results to the user. This process can prevent the user from switching back and forth between the two applications, saving the user the operation required to understand a character or item in the currently playing screen, and improving work efficiency. .
  • the execution subject of each step may be a terminal described in the foregoing embodiment.
  • the terminal has a video playing function.
  • the terminal also has an image recognition function.
  • an application for implementing a video playback function is installed and run in the terminal, and the execution subject of each step may be the application, and the application may be a system application or a third-party application.
  • the execution subject of each step is a terminal is used as an example for description, but this is not a limitation.
  • FIG. 4 shows a flowchart of a video image recognition method provided by an embodiment of the present application.
  • the method may include the following steps:
  • Step 401 When in a video playback scene, display a picture recognition function control in a sidebar.
  • a video playing scene refers to a scene where a terminal is playing a video.
  • the terminal plays a video by playing an application program; in another possible implementation manner, the terminal plays a video in a webpage through a browser.
  • the sidebar is used to display application icons and / or function controls in the terminal, so that the terminal can conveniently open other applications or execute functions corresponding to the function controls while the terminal is running the application in the foreground.
  • the application icons and / or function controls displayed in the sidebar can be set by the terminal by default or can be customized by the user.
  • the image recognition function control is included in the sidebar.
  • the image recognition function control is used to trigger image recognition of the picture in the currently playing video.
  • the identification function control may be displayed when the video starts to play, or may be displayed according to an operation signal triggered by a user.
  • the embodiment of the present application does not limit the display timing of the identification function control.
  • step 401 may include the following two sub-steps:
  • Step 401a when in a video playing scene, receive an outgoing call instruction corresponding to a sidebar;
  • step 401b a sidebar is displayed according to the outgoing call instruction.
  • the call out command is used to call out the sidebar.
  • a buoy is displayed on the display interface of the terminal, and if a trigger signal acting on the buoy is received, the terminal receives an outgoing call instruction.
  • the buoy can always be displayed on the upper layer of the display interface, or can be displayed on the upper layer of the display interface when the application is started and run, and can also be displayed on the upper layer of the display interface according to the operation signal triggered by the user.
  • the embodiment of the present application does not limit the display timing of the buoy.
  • the shape of the buoy may be a circle, an oval, a rectangle, or the like, and the shape of the buoy is not limited in the embodiment of the present application.
  • the area of the buoy can be set by the terminal by default, or can be set by the user, which is not limited in the embodiment of the present application.
  • the buoy in order to reduce the occlusion of the display interface as much as possible, the buoy can be set to a transparency greater than 0.
  • the trigger signal acting on the buoy may be any one of a click signal, a double-click signal, a long-press signal, a slide signal, and a drag signal, which is not limited in the embodiment of the present application.
  • the trigger signal acting on the buoy is a slide signal as an example for description.
  • the terminal when the terminal is in the horizontal screen display state, the buoy will block the display interface, and the user's immersion is reduced at this time.
  • the terminal when the terminal receives a trigger signal on the display side, it receives an outgoing call instruction.
  • the trigger signal on the display side is a sliding signal from the outside to the inside of the display side.
  • FIG. 5 a schematic diagram of an interface for displaying a video image recognition control according to an embodiment of the present application is shown.
  • the user When in the video playback scene, the user performs a sliding operation from the outside to the inside of the left side of the terminal. After receiving the sliding operation signal, the terminal displays a sidebar 51, and the sidebar 51 includes a picture recognition control 52.
  • step 402 when a first trigger signal corresponding to the image recognition function control is received, screenshot processing is performed on the current playback screen to obtain a target image.
  • the first trigger signal is triggered by the user, and may be any one of a click signal, a double-click signal, a long-press signal, a slide signal, and a drag signal.
  • the first trigger signal is a click signal as an example for description.
  • the target image is an image that needs to be identified.
  • the terminal determines the playback picture displayed when the first trigger signal is received as the target image.
  • the target image needs to be displayed to the user, so that the user determines whether it is an image that needs to be identified.
  • the target image is acquired by a screenshot processing method. Screenshot processing refers to capturing the current playback frame and determining the captured playback frame as the target image.
  • the terminal performs screenshot processing on a complete current playback picture to obtain a target image.
  • the terminal performs a screenshot process on a part of the pictures currently being played to obtain a target image.
  • Some of the above screens can be selected by the user.
  • the video playback is paused and the user is prompted to capture the target image, the user performs a drag operation on the current playback screen, and then the terminal intercepts the diagonal line as the drag A rectangular area of a straight line from the start point to the end point of the operation signal is used as the target image.
  • the terminal After the terminal acquires the target image, it can also display the target image. Optionally, the terminal displays the target image on a floating window. Because the size of the floating window is small, when the target image is displayed on the floating window, the target image needs to be reduced in size.
  • Step 403 Obtain an image recognition result of the target image.
  • the image recognition result is obtained by performing image recognition on the target image.
  • the image recognition result may include at least one record, and each record is used to represent a recognition result of an element in the target image, which may be a person identification or an item identification.
  • the character identification is used to uniquely identify the person, and it may be the name of the person.
  • the terminal recognizes the person in the current playback screen and obtains the character identification.
  • the item identifier is used to uniquely identify the item, which may be the name of the item.
  • the terminal recognizes the item in the currently playing screen and obtains the item identification.
  • the image recognition result also includes the similarity corresponding to each record.
  • the similarity refers to the similarity between the record and the corresponding element in the target image, and is used to measure the accuracy of the image recognition result. The higher the similarity, the more accurate the image recognition result; the lower the similarity, the less accurate the image recognition result is.
  • the terminal recognizes a target image to obtain an image recognition result.
  • the server performs image recognition on the target image to obtain an image recognition result, and then the terminal obtains the image recognition result from the server.
  • the terminal sends an identification request to the server, and the identification request carries the identification of the terminal and the target image, and the server recognizes the target image according to the identification request, obtains an image recognition result, and returns the image recognition result to the terminal.
  • the terminal sends an identification request to the server, and the identification request carries the identification of the terminal and the target image, and the server recognizes the target image according to the identification request, obtains an image recognition result, and returns the image recognition result to the terminal.
  • the embodiment of the present application does not limit the algorithm used for image recognition. It may be an image recognition algorithm based on model matching, an image recognition algorithm based on neural networks, an image recognition algorithm based on wavelet moments, an image recognition algorithm based on fractal features, and so on. This is not limited in the embodiments of the present application.
  • the terminal may further display query information, where the query information is used to query whether it is necessary to obtain an image recognition result of the target image.
  • the terminal may further display query information, where the query information is used to query whether it is necessary to obtain an image recognition result of the target image.
  • Step 404 Display the image recognition result.
  • the terminal After the terminal obtains the image recognition result, the terminal displays the image recognition result for the user to view. Optionally, the image recognition result is also displayed in the floating window mentioned in step 402.
  • the technical solution provided in the embodiments of the present application by displaying a video recognition control in a video playback scene, if the user desires to know a certain character or item in the current playback screen, directly click the video recognition control, and then the terminal Perform image recognition on the current playback screen and display the image recognition results to the user.
  • This process can prevent users from switching back and forth between the two applications, saving the user the operation required to understand a character or item in the current playback screen. The operation is more convenient and the recognition efficiency is higher.
  • an image may include multiple objects, such as people, objects, animals, flowers, trees, etc.
  • the terminal still performs image recognition on the entire image, it may result in low recognition efficiency.
  • the user selects an object to be identified among the multiple objects, and then the terminal obtains only the image recognition result of the object to be recognized, without acquiring the image recognition result of the entire image, which can improve the recognition efficiency.
  • the target image includes multiple objects to be identified, and step 403 includes the following two sub-steps:
  • Step 501 Determine a target to-be-recognized object included in the target image.
  • the target to-be-recognized object refers to an object that the user desires to recognize, which can be selected by the user.
  • the number of target to-be-recognized objects may be one or multiple.
  • the number of target to-be-recognized objects may be less than the number of objects contained in the target image, or may be equal to the number of objects contained in the target image.
  • step 501 includes the following sub-steps:
  • Step 501a displaying a person identification control and / or an item identification control
  • the person recognition control is used to trigger the recognition of the area containing the person image in the target image
  • the item recognition control is used to trigger the recognition of the area containing the object image in the target image.
  • the terminal displays the above-mentioned person recognition control and / or item recognition control while displaying the target image.
  • the above-mentioned person recognition control and / or item recognition control are also displayed in the floating window.
  • step 501b when a second trigger signal corresponding to the person recognition control is received, it is determined that the target object to be identified is an area including a person image in the target image;
  • the area containing the person image in the target image may be a rectangular area containing a face image. Further, the area containing the person image in the target image is the smallest rectangular area containing the face image.
  • step 501c when a third trigger signal corresponding to the item identification control is received, it is determined that the target object to be identified is an area in the target image that includes the item image.
  • the area containing the image of the article in the target image may be the area containing the entire article or a rectangular area containing the key features of the article.
  • the key characteristics of the item can be determined based on the actual item. For example, when the item is a flower, its key feature is a petal. Further, the area containing the image of the article in the target image may be the smallest rectangular area containing the entire article, or the smallest rectangular area containing key features of the article.
  • FIG. 6 a schematic diagram of an interface for determining an object to be identified is provided according to an embodiment of the present application.
  • the terminal displays a target image 62, a person recognition control 63, and an article recognition control 64 on the floating window 61.
  • the terminal determines that the target to-be-recognized object is a region containing the person image in the target image;
  • the terminal determines that the target to-be-recognized object is an area that includes the object image in the target image.
  • step 501 includes the following sub-steps:
  • Step 501d displaying a target image
  • Each object to be identified in the target image is labeled with a different serial number.
  • the terminal also displays the above-mentioned different serial numbers below the target image.
  • Step 501e Receive a selection signal corresponding to the target sequence number
  • the selection signal corresponding to the target number may be any one of a click signal, a double-click signal, a long-press signal, a slide signal, and a drag signal, which is not limited in this embodiment of the present application.
  • the selection signal corresponding to the target sequence number is a click signal as an example for description.
  • the target sequence number is the selected sequence number. If the user wants to know about an object, he can select the serial number corresponding to the object. If the terminal also displays the different serial numbers above the target image, the terminal may select the target serial number in the target image, or select the target serial number among the serial numbers displayed below the target image.
  • step 501f the object to be identified corresponding to the target number is determined as the target object to be identified.
  • the terminal determines the object corresponding to the selected serial number as the target object to be identified.
  • the floating window further includes a completion control, and when the terminal receives a confirmation instruction corresponding to the completion control, the object corresponding to the selected serial number is determined as the object to be identified.
  • FIG. 7 a schematic diagram of an interface for determining an object to be identified is provided according to an embodiment of the present application.
  • the terminal displays a target image 62 and a completion control 71 on the floating window 61.
  • Each to-be-recognized object in the target image 62 is identified with a different serial number.
  • the terminal When the user clicks a certain serial number and the completion control 71, the terminal corresponding to the serial number
  • the recognition object is determined as the target to-be-recognized object.
  • step 501 includes the following sub-steps:
  • Step 501g displaying a target image
  • Step 501h receiving a third trigger signal acting on the target image
  • the third trigger signal may be any one of a click signal, a double-click signal, a long-press signal, a slide signal, and a drag signal, which is not limited in the embodiment of the present application.
  • the object to be identified in the target area corresponding to the third trigger signal is determined as the target object to be identified.
  • the target area corresponding to the third trigger signal refers to the trigger position of the third trigger signal as the center and the area is a preset area Area.
  • the trigger position of the third trigger signal refers to a contact position between the user's finger and the display screen.
  • the preset area can be set according to actual experience, which is not limited in the embodiment of the present application.
  • the target area corresponding to the third trigger signal is a rectangular area with the motion track of the third trigger signal as a diagonal.
  • the floating window further includes a completion control, and when the terminal receives a confirmation instruction corresponding to the completion control, the object in the target area corresponding to the third trigger signal is determined as the target object to be identified.
  • FIG. 8 a schematic diagram of an interface for determining an object to be identified is provided according to another embodiment of the present application.
  • the terminal displays a target image 62 on the floating window 61.
  • the terminal determines the target object to be identified in the area 81 centered on the position and having an area of a preset area.
  • Step 302 Perform image recognition on the target to-be-recognized object to obtain an image recognition result.
  • step 302 may be implemented as: performing image recognition on the target object to be recognized through a machine learning model to obtain an image recognition result.
  • a machine learning model is obtained by training a neural network using multiple sets of training sample data.
  • Each set of training sample data in the plurality of sets of training sample data includes a sample image and a recognition result corresponding to the sample image.
  • the recognition result corresponding to the sample image can be obtained manually, that is, the relevant technician determines the recognition result corresponding to the sample image and records it.
  • the neural network may be a Convolutional Neural Network (CNN), an Artificial Neural Network (ANN), a Deep Neural Networks (DNN), and the like, which are not limited in the embodiments of the present application.
  • CNN Convolutional Neural Network
  • ANN Artificial Neural Network
  • DNN Deep Neural Networks
  • the machine learning algorithm used in training the machine learning model can be a back-propagation (BP) algorithm, a faster regional convolutional neural network faster RCNN (faster Regions with Convolutional Neural Network, faster RCNN) algorithm, etc., this application
  • BP back-propagation
  • RCNN faster Regions with Convolutional Neural Network
  • the machine learning model includes: an input layer, at least one hidden layer, and an output layer.
  • the input data of the input layer is the target image or the target object to be identified in the target image
  • the output result of the output layer is the image recognition result of the target image.
  • the determination process is as follows: the target image or the object to be identified in the target image is input to the input layer of the machine learning model, and the hidden layer of the machine learning model performs feature extraction on the above feature data, and combines and abstracts the extracted features , And finally the image recognition result of the target image is output by the output layer.
  • the specific structure of the hidden layer is not limited. Generally speaking, the more layers of a neural network, the better the effect but the longer the calculation time. In practical applications, a neural network with an appropriate number of layers can be designed in accordance with the accuracy requirements.
  • the training process of the machine learning model is as follows: obtaining the initial machine learning model, inputting the sample images in the sample training data to the initial machine learning model, and outputting the actual recognition results corresponding to the sample images from the initial machine learning model, The recognition result is compared with the image recognition result corresponding to the sample image to obtain the calculated loss, and then the calculated loss is compared with a preset threshold. If the calculated loss is greater than the preset threshold, the parameters of the initial machine learning model are updated. Then, the steps of inputting the sample images in the sample training data to the initial machine learning model are restarted. If the calculation loss is not greater than a preset threshold, a machine learning model is generated.
  • the preset threshold may be determined actually according to the recognition accuracy, which is not limited in the embodiment of the present application.
  • the technical solution provided in the embodiments of the present application allows the user to first identify the person or article to be identified in the image to be identified, and does not need to perform image recognition on the entire image during subsequent image recognition.
  • the selected people or objects for identification can improve the efficiency of image recognition.
  • the terminal may also obtain and display related information corresponding to the image recognition result, so that the user can know more abundant and comprehensive information about the person or article in the playback screen.
  • the video image recognition method may further include the following steps:
  • Step 601 Obtain related information corresponding to the image recognition result.
  • the related information corresponding to the image recognition result includes one or more of the following combinations: encyclopedia information, social account information, news information information, and work information of the person corresponding to the person identification.
  • Encyclopedia information refers to the detailed information of the person, which usually includes name, age, occupation, birthday, and so on.
  • the social account information includes a web page link of the social account used by the character. When the web page link is clicked, the terminal displays the main page of the social account so that the user can establish a social relationship with the social account by using the social account. It can be concern relationship, listening relationship, friend relationship, etc.
  • News information refers to news information related to the person.
  • the work information includes a detailed introduction to the work in which the character has appeared, and a link to visit.
  • the related information corresponding to the image recognition result includes one or more of the following combinations: encyclopedia information and purchase information of the item corresponding to the item identification.
  • Encyclopedia information refers to the detailed information of the item, which can include the name, material, weight, etc. of the item.
  • the purchase information includes a purchase link for the item. When the purchase link is clicked, the terminal displays a purchase page for the item so that the user can purchase the item.
  • the terminal acquires the related information of the image recognition result locally.
  • the terminal obtains related information of the image recognition result from the server. Specifically, the terminal sends an acquisition request to the server, and the identification request carries the identification of the terminal and the image recognition result. The server obtains related information corresponding to the image recognition result according to the acquisition request, and returns the related information to the terminal.
  • the second possible implementation manner is used as an example for explanation.
  • Step 602 Display related information corresponding to the image recognition result.
  • the terminal directly jumps to display related information corresponding to the image recognition result.
  • the terminal displays a jump control while displaying the image recognition result, and when the terminal receives a trigger signal corresponding to the jump control, displays related information corresponding to the image recognition result.
  • the terminal displays the jump control corresponding to each record.
  • the terminal displays related information of the record corresponding to the target jump control.
  • the terminal when the terminal displays related information corresponding to the image recognition result, it also displays a favorite control.
  • the terminal receives the trigger signal corresponding to the favorite control, the terminal saves relevant information corresponding to the image recognition result.
  • the favorite control will change to the favorited state.
  • the terminal directly stores the foregoing related information in the first storage path, and subsequent users can directly view the related information when there is no network connection, thereby reducing traffic consumption.
  • the terminal stores the access address corresponding to the related information in the second storage path, and subsequent users can obtain and view related information again through the access address, thereby reducing the storage space occupation of the terminal.
  • the first storage path and the second storage path may be set by a user, or may be set by a terminal by default, which is not limited in the embodiment of the present application.
  • each item corresponds to a favorite control, so that the user can selectively save the related related information that he needs.
  • FIG. 9 illustrates a schematic diagram of an interface for displaying related information provided by an embodiment of the present application.
  • the terminal displays the target image 62, the person recognition control 63, and the object recognition control 64 in the floating window 61.
  • the terminal displays the first record 91 "person in the image recognition result" in the floating window 61 A ", jump control 92 corresponding to the first record 91" Person A “, jump control 94 corresponding to the second record 93" Person B “and second record 93" Person B “; when the user clicks on the first
  • the jump control 92 corresponding to the record 91 "Person A the floating window 61 displays the related information 95 corresponding to the first record 91 "Person A” and the favorite control 96.
  • FIG. 10 a schematic diagram of an interface for displaying related information provided by an embodiment of the present application is shown.
  • the terminal displays a target image 62 in the floating window 61.
  • the terminal determines an object in the area 1001 centered on the position and having an area of a preset area as the object to be identified, and then the terminal obtains the object's Image recognition results.
  • the image recognition results include a record "authentic baseball caps, tide brand hats, sun hats, men and women.”
  • the terminal directly displays a plurality of related information 1002 of the image recognition results in the floating window 61, and the corresponding collection of each related information Control 1003.
  • the terminal When the terminal obtains the image recognition result, there may be a case where the image recognition result is not obtained.
  • the terminal displays first prompt information, and the first prompt information is used to prompt that relevant information cannot be obtained.
  • FIG. 11 a schematic diagram of an interface of the first prompt information provided by an embodiment of the present application is shown.
  • the terminal fails to obtain the image recognition result, the terminal displays the target image 62 and the first prompt information 1101 “No relevant information found” in the floating window 61.
  • the terminal displays a second prompt message at this time, and the second prompt information is used to prompt the user to establish a network connection, so that the terminal can retrieve relevant information.
  • the terminal also displays a network setting control. When the terminal receives a trigger signal corresponding to the network setting control, it jumps to the network setting interface so that the user can complete the network setting.
  • FIG. 12 a schematic diagram of an interface of the second prompt information provided by one embodiment of the present application is shown.
  • the target image 62 and the first prompt message 1201 “Please try again after connecting to the network” and the network setting control 1202 are displayed in the floating window 61.
  • FIG. 13 is a block diagram of a video image recognition apparatus provided by an embodiment of the present application.
  • the device has a function for implementing the above method example, and the function may be implemented by hardware, or may be implemented by hardware executing corresponding software.
  • the device may include:
  • the control display module 1301 is configured to display a picture recognition function control in a sidebar when the video is playing.
  • the image acquisition module 1302 is configured to perform a screenshot process on a current playback screen when a first trigger signal corresponding to the image recognition function control is received, to obtain a target image.
  • An image recognition module 1303 is configured to obtain an image recognition result of the target image, where the image recognition result is obtained by performing image recognition on the target image.
  • a result display module 1304 is configured to display the image recognition result.
  • the technical solution provided in the embodiments of the present application by displaying a video recognition control in a video playback scene, if the user desires to know a certain character or item in the current playback screen, directly click the video recognition control, and then the terminal Perform image recognition on the current playback screen and display the image recognition results to the user.
  • This process can prevent users from switching back and forth between the two applications, saving the user the operation required to understand a character or item in the current playback screen. The operation is more convenient and the recognition efficiency is higher.
  • the target image includes multiple objects to be identified, and the image recognition module 1303 is configured to:
  • the image recognition module 1303 is configured to:
  • determining that the target object to be identified is a region in the target image that includes a person image
  • the target object to be identified is an area in the target image that includes an item image.
  • the image recognition module 1303 is configured to:
  • the object to be identified corresponding to the target sequence number is determined as the object to be identified.
  • the image recognition module 1303 is configured to:
  • An object to be identified in a target area corresponding to the third trigger signal is determined as the target object to be identified.
  • the image recognition module 1303 is configured to perform image recognition on the target to-be-recognized object through a machine learning model to obtain the image recognition result.
  • the machine learning model is to use multiple sets of training sample data to the neural network. According to training, each set of training sample data in the plurality of sets of training sample data includes a sample image and a recognition result corresponding to the sample image.
  • the device further includes: an information acquisition module and an information display module (not shown in the figure)
  • An information acquisition module is configured to acquire related information corresponding to the image recognition result.
  • An information display module is configured to display related information corresponding to the image recognition result.
  • the related information corresponding to the image recognition result includes one or more of the following combinations: encyclopedia information, social account information, news information information, works of the person corresponding to the person identification information;
  • the related information corresponding to the image recognition result includes one or more of the following combinations: encyclopedia information and purchase information of the item corresponding to the item identification.
  • the information display module is configured to:
  • control display module 1301 is configured to:
  • a computer-readable storage medium stores a computer program, and the computer program is loaded and executed by a processor of a terminal to implement the foregoing method embodiments. Steps.
  • a computer program product is also provided, and when the computer program product is executed, it is used to implement the functions of each step in the foregoing method embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

Provided in the embodiments of the present application are a video image recognition method and apparatus, a terminal and a storage medium. The method comprises: in a video playing scene, displaying an image recognition function control in a sidebar; when a first trigger signal corresponding to the image recognition function control is received, carrying out screenshot processing on the currently played picture to obtain a target image; acquiring an image recognition result of the target image; and displaying the image recognition result. In the embodiments of the present application, a video image recognition control is displayed in the video playing scene, if a user wants to know about a person or item in the currently played picture, the user directly clicks on the video recognition control, and then, the terminal carries out image recognition on the currently played picture and displays the image recognition result to the user. By means of this process, the user does not need to switch back and forth between two application programs, the user operation required for knowing about a person or item in the currently played picture is simplified, the operation is more convenient, and the image recognition efficiency is higher.

Description

视频识图方法、装置、终端及存储介质Video image recognition method, device, terminal and storage medium
本申请要求于2018年08月22日提交的申请号为201810963246.7、发明名称为“视频识图方法、装置、终端及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority from a Chinese patent application filed on August 22, 2018 with an application number of 201810963246.7 and an invention name of "Video Mapping Method, Device, Terminal, and Storage Medium", the entire contents of which are incorporated herein by reference. in.
技术领域Technical field
本申请实施例涉及终端技术领域,特别涉及一种视频识图方法、装置、终端及存储介质。The embodiments of the present application relate to the technical field of terminals, and in particular, to a video image recognition method, device, terminal, and storage medium.
背景技术Background technique
用户在观看视频时,会存在了解视频中的人物或者物品的需求。例如,用户通过终端中的播放应用程序看电影时,想要知道电影人物扮演者的相关信息。When a user watches a video, there is a need to understand the characters or objects in the video. For example, when a user watches a movie through a playback application in the terminal, he wants to know the relevant information about the person who plays the movie.
相关技术中,若用户想要了解视频中的人物或者物品,通常会先触发终端对当前播放界面进行截图并保存该截图,之后触发终端退出播放应用程序并启动运行搜索应用程序,用户将上述截图上传至搜索应用程序中并点击搜索控件,此时终端从网络中获取上述人物或者物品的相关信息并展示给用户。In related technology, if the user wants to know the characters or objects in the video, the terminal is usually triggered to take a screenshot of the current playback interface and save the screenshot, and then the terminal is triggered to exit the playback application and start running the search application. The user takes the above screenshot Upload to the search application and click the search control. At this time, the terminal obtains the relevant information about the person or item from the network and displays it to the user.
发明内容Summary of the Invention
本申请实施例提供了一种视频识图方法、装置、终端及存储介质。所述技术方案如下:The embodiments of the present application provide a video image recognition method, device, terminal, and storage medium. The technical solution is as follows:
一方面,本申请实施例提供了一种视频识图方法,所述方法包括:In one aspect, an embodiment of the present application provides a video image recognition method, where the method includes:
在处于视频播放场景时,在侧边栏显示识图功能控件;When in a video playback scene, the picture recognition control is displayed in the sidebar;
在接收到对应于所述识图功能控件的第一触发信号时,对当前播放画面进行截图处理,得到目标图像;When receiving a first trigger signal corresponding to the image recognition function control, performing screenshot processing on a current playback screen to obtain a target image;
对所述目标图像进行图像识别,得到所述目标图像的图像识别结果;Performing image recognition on the target image to obtain an image recognition result of the target image;
显示所述图像识别结果。The image recognition result is displayed.
另一方面,本申请实施例提供了视频识图装置,所述装置包括:In another aspect, an embodiment of the present application provides a video image recognition apparatus, where the apparatus includes:
控件显示模块,用于在处于视频播放场景时,在侧边栏显示识图功能控件;A control display module is used to display the image recognition control in the sidebar when the video is playing.
图像获取模块,用于在接收到对应于所述识图功能控件的第一触发信号时,对当前播放画面进行截图处理,得到目标图像;An image acquisition module, configured to, when receiving a first trigger signal corresponding to the image recognition function control, perform screenshot processing on a current playback screen to obtain a target image;
图像识别模块,用于对所述目标图像进行图像识别,得到所述目标图像的图像识别结果;An image recognition module, configured to perform image recognition on the target image to obtain an image recognition result of the target image;
结果显示模块,用于显示所述图像识别结果。A result display module, configured to display the image recognition result.
又一方面,本申请实施例提供了一种终端,所述终端包括处理器和存储器,所述存储器存储有计算机程序,所述计算机程序由所述处理器加载并执行以实现如上述方面所述的视频识图方法。In another aspect, an embodiment of the present application provides a terminal. The terminal includes a processor and a memory. The memory stores a computer program, and the computer program is loaded and executed by the processor to implement the method described in the foregoing aspect. Video recognition method.
再一方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序由处理器加载并执行如上述方面所述的视频识图方法。In yet another aspect, an embodiment of the present application provides a computer-readable storage medium. The computer-readable storage medium stores a computer program, and the computer program is loaded by a processor and executes the video image recognition according to the foregoing aspect. method.
本申请实施例提供的技术方案可以带来如下有益效果:The technical solutions provided in the embodiments of the present application can bring the following beneficial effects:
通过在视频播放场景下显示视频识图控件,若用户期望了解当前播放画面中的某一人物或物品,直接点击该视频识别控件,之后终端对当前播放画面进行图像识别,并将图像识别结果展示给用户,该过程可以避免用户在两个应用程序之间来回切换,节省用户了解当前播放画面中的某一人物或物品所需的操作,操作更加便捷且识图效率更高。By displaying the video recognition control in the video playback scene, if the user desires to know a certain person or item in the current playback screen, directly click the video recognition control, and then the terminal performs image recognition on the current playback screen and displays the image recognition result. To the user, this process can prevent the user from switching back and forth between the two applications, saving the user the operation required to understand a certain character or item in the currently playing screen, and the operation is more convenient and the picture recognition efficiency is higher.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1是本申请一个示例性实施例提供的终端的结构示意图;FIG. 1 is a schematic structural diagram of a terminal provided by an exemplary embodiment of the present application; FIG.
图2是本申请另一个示例性实施例提供的终端的结构示意图;2 is a schematic structural diagram of a terminal according to another exemplary embodiment of the present application;
图3A至图3F是本申请的示例性实施例提供的具有不同触摸显示屏的终端的外观示意图;3A to 3F are schematic diagrams of appearances of terminals with different touch display screens provided by exemplary embodiments of the present application;
图4是本申请一个实施例提供的视频识图方法的流程图;4 is a flowchart of a video image recognition method according to an embodiment of the present application;
图5是本申请一个实施例提供的显示视频识图控件的界面示意图;5 is a schematic diagram of an interface for displaying a video image recognition control provided by an embodiment of the present application;
图6是本申请一个实施例提供的确定目标待识别对象的界面示意图;FIG. 6 is a schematic diagram of an interface for determining a target to-be-identified object according to an embodiment of the present application; FIG.
图7是本申请另一个实施例提供的确定目标待识别对象的界面示意图;7 is a schematic diagram of an interface for determining a target to-be-recognized object according to another embodiment of the present application;
图8是本申请另一个实施例提供的确定目标待识别对象的界面示意图;8 is a schematic diagram of an interface for determining a target to-be-recognized object according to another embodiment of the present application;
图9是本申请一个实施例提供的视频识图方法的界面示意图;9 is a schematic interface diagram of a video image recognition method provided by an embodiment of the present application;
图10是本申请一个实施例提供的视频识图方法的界面示意图;10 is a schematic interface diagram of a video image recognition method provided by an embodiment of the present application;
图11是本申请一个实施例提供的视频识图失败的界面示意图;FIG. 11 is a schematic diagram of an interface where video image recognition fails according to an embodiment of the present application; FIG.
图12是本申请一个实施例提供的视频识图失败的界面示意图;FIG. 12 is a schematic diagram of an interface where video image recognition fails according to an embodiment of the present application; FIG.
图13是本申请一个实施例提供的视频识图装置的框图。FIG. 13 is a block diagram of a video image recognition apparatus according to an embodiment of the present application.
具体实施方式detailed description
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。To make the objectives, technical solutions, and advantages of the present application clearer, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
参考图1和图2所示,其示出了本申请一个示例性实施例提供的终端100的结构方框图。该终端100可以是手机、平板电脑、笔记本电脑和电子书等。本申请中的终端100可以包括一个或多个如下部件:处理器110、存储器120和触摸显示屏130。Reference is made to FIG. 1 and FIG. 2, which are structural block diagrams of a terminal 100 according to an exemplary embodiment of the present application. The terminal 100 may be a mobile phone, a tablet computer, a notebook computer, an e-book, or the like. The terminal 100 in the present application may include one or more of the following components: a processor 110, a memory 120, and a touch display screen 130.
处理器110可以包括一个或者多个处理核心。处理器110利用各种接口和线路连接整个终端100内的各个部分,通过运行或执行存储在存储器120内的指令、程序、代码集或指令集,以及调用存储在存储器120内的数据,执行终端100的各种功能和处理数据。可选地,处理器110可以采用数字信号处理(Digital Signal Processing,DSP)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、可编程逻辑阵列(Programmable Logic Array,PLA)中的至少一种硬件形式来实现。处理器110可集成中央处理器(Central Processing Unit,CPU)、图像处理器(Graphics Processing Unit,GPU)和调制解调器等中的一种或几种的组合。其中,CPU主要处理操作系统、用户界面和应用程序等;GPU用于负责触摸显示屏130所需要显示的内容的渲染和绘制;调制解调器用于处理无线通信。可以理解的是,上述调制解调器也可以不集成到处理器110中,单独通过一块芯片进行实现。The processor 110 may include one or more processing cores. The processor 110 uses various interfaces and lines to connect various parts in the entire terminal 100, and executes the terminal by running or executing instructions, programs, code sets or instruction sets stored in the memory 120, and calling data stored in the memory 120. 100 various functions and processing data. Optionally, the processor 110 may use at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). To implement a hardware form. The processor 110 may integrate one or a combination of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem. Among them, the CPU mainly handles the operating system, user interface, and application programs; the GPU is responsible for rendering and rendering of the content required to be touched by the display screen 130; and the modem is used for processing wireless communication. It can be understood that the modem may not be integrated into the processor 110, and may be implemented by a single chip.
存储器120可以包括随机存储器(Random Access Memory,RAM),也可以包括只读存储器(Read-Only Memory)。可选地,该存储器120包括非瞬时性计算机可读介质(non-transitory computer-readable storage medium)。存储器120可用于存储指令、程序、代码、代码集或指令集。存储器120可包括存储程序区和存储数据区,其中,存储程序区可存储用于实现操作系统的指令、用于至少一个功能的指令(比如触控功能、声音播放功能、图像播放功能等)、用于实现下述各个方法实施例的指令等;存储数据区可存储根据终端100的使用所创建的数据(比如音频数据、电话本)等。The memory 120 may include Random Access Memory (RAM), and may also include Read-Only Memory. Optionally, the memory 120 includes a non-transitory computer-readable storage medium. The memory 120 may be used to store instructions, programs, codes, code sets, or instruction sets. The memory 120 may include a storage program area and a storage data area, where the storage program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playback function, an image playback function, etc.), Instructions for implementing the following method embodiments, etc .; the storage data area may store data (such as audio data, phone book) and the like created according to the use of the terminal 100.
以操作系统为安卓(Android)系统为例,存储器120中存储的程序和数据如图1所示,存储器120中存储有Linux内核层220、系统运行库层240、应用框架层260和应用层280。Linux内核层220为终端100的各种硬件提供了底层的驱动,如显示驱动、音频驱动、摄像头驱动、蓝牙驱动、Wi-Fi驱动、电源管理等。系统运行库层240通过一些C/C++库来为Android系统提供了主要的特性支持。如SQLite库提供了数据库的支持,OpenGL/ES库提供了3D绘图的支持,Webkit库提供了浏览器内核的支持等。在系统运行库层240中还提供有Android运行时库242(Android Runtime),它主要提供了一些核心库,能够允许开发者使用Java语言来编写Android应用。应用框架层260提供了构建应用程序时可能用到的各种API,开发者也可以通过使用这些API来构建自己的应用程序,比如活动管理、窗口管理、视图管理、通知管理、内容提供者、包管理、通话管理、资源管理、定位管理。应用层280中运行有至少一个应用程序,这些应用程序可以是操作系统自带的联系人程序、短信程序、时钟程序、相机应用等;也可以是第三方开发者所开发的应用程序,比如即时通信程序、相片美化程序等。Taking the operating system as an Android system as an example, the programs and data stored in the memory 120 are shown in FIG. 1. The memory 120 stores a Linux kernel layer 220, a system runtime layer 240, an application framework layer 260, and an application layer 280. . The Linux kernel layer 220 provides low-level drivers for various hardware of the terminal 100, such as a display driver, an audio driver, a camera driver, a Bluetooth driver, a Wi-Fi driver, and power management. The system runtime layer 240 provides main feature support for the Android system through some C / C ++ libraries. For example, the SQLite library provides database support, the OpenGL / ES library provides 3D graphics support, and the Webkit library provides browser kernel support. An Android runtime library 242 (Android Runtime) is also provided in the system runtime layer 240, which mainly provides some core libraries, which can allow developers to write Android applications using the Java language. The application framework layer 260 provides various APIs that may be used when building applications. Developers can also use these APIs to build their own applications, such as activity management, window management, view management, notification management, content providers, Package management, call management, resource management, positioning management. At least one application program is running in the application layer 280, and these application programs may be contact programs, SMS programs, clock programs, camera applications, etc. that are native to the operating system; or applications developed by third-party developers, such as instant Communication programs, photo beautification programs, etc.
以操作系统为IOS系统为例,存储器120中存储的程序和数据如图2所示,IOS系统包括:核心操作系统层320(Core OS layer)、核心服务层340(Core Services layer)、媒体层360(Media layer)、可触摸层380(Cocoa Touch Layer)。核心操作系统层320包括了操作系统内核、驱动程序以及底层程序框架,这些底层程序框架提供更接近硬件的功能,以供位于核心服务层340的程序框架所使用。核心服务层340提供给应用程序所需要的系统服务和/或程序框架,比如基础(Foundation)框架、账户框架、广告框架、数据存储框架、网络连接框架、地理位置框架、运动框架等等。媒体层360为应用程序提供有关视听方面的接口,如图形图像相关的接口、音频技术相关的接口、视频技术相关的接口、音视频传输技术的无线播放(AirPlay)接口等。可触摸层380为应用程序开发提供了各种常用的界面相关的框架,可触摸层380负责用户在终端100上的触摸交互操作。比如本地通知服务、远程推送服务、广告框架、游戏工具框架、消息用户界面接口(User Interface,UI)框架、用户界面UIKit框架、地图框架等等。Taking the operating system as an IOS system as an example, the programs and data stored in the memory 120 are shown in Figure 2. The IOS system includes: a core operating system layer 320 (Core OS layer), a core service layer 340 (Core services layer), and a media layer 360 (Media layer), 380 (Cocoa Touch Layer). The core operating system layer 320 includes an operating system kernel, drivers, and a low-level program framework. These low-level program frameworks provide functions closer to the hardware for use by the program framework located in the core service layer 340. The core service layer 340 provides system services and / or program frameworks required by applications, such as a Foundation framework, an account framework, an advertising framework, a data storage framework, a network connection framework, a geographic location framework, a motion framework, and so on. The media layer 360 provides audio-visual-related interfaces for applications, such as interfaces related to graphics and images, interfaces related to audio technology, interfaces related to video technology, and wireless playback (AirPlay) interfaces for audio and video transmission technologies. The touchable layer 380 provides various commonly-used interface-related frameworks for application development. The touchable layer 380 is responsible for user touch interaction operations on the terminal 100. For example, a local notification service, a remote push service, an advertising framework, a game tool framework, a message user interface interface (UI) framework, a user interface UIKit framework, a map framework, and so on.
在图2所示出的框架中,与大部分应用程序有关的框架包括但不限于:核心服务层340中的基础框架和可触摸层380中的UIKit框架。基础框架提供许多基本的对象类和数据类型,为所有应用程序提供最基本的系统服务,和UI无关。而UIKit框架提供的类是基础的UI类库,用于创建基于触摸的用户界面,iOS应用程序可以基于UIKit框架来提供UI,所以它提供了应用程序的基础架构,用于构建用户界面,绘图、 处理和用户交互事件,响应手势等等。Among the frameworks shown in FIG. 2, frameworks related to most applications include, but are not limited to, a basic framework in a core service layer 340 and a UIKit framework in a touchable layer 380. The basic framework provides many basic object classes and data types, and provides the most basic system services for all applications, regardless of the UI. The classes provided by the UIKit framework are basic UI class libraries for creating touch-based user interfaces. IOS applications can provide UIs based on the UIKit framework, so it provides the application's infrastructure for building user interfaces and drawing. , Handle events with user interaction, respond to gestures, and more.
触摸显示屏130用于接收用户使用手指、触摸笔等任何适合的物体在其上或附近的触摸操作,以及显示各个应用程序的用户界面。触摸显示屏130通常设置在终端130的前面板。触摸显示屏130可被设计成为全面屏、曲面屏或异型屏。触摸显示屏130还可被设计成为全面屏与曲面屏的结合,异型屏与曲面屏的结合,本实施例对此不加以限定。其中:The touch display screen 130 is used for receiving a user's touch operation on or near any suitable object using a finger, a touch pen, or the like, and displaying a user interface of each application program. The touch display screen 130 is generally disposed on a front panel of the terminal 130. The touch display screen 130 may be designed as a full screen, a curved screen or a special-shaped screen. The touch display screen 130 can also be designed as a combination of a full screen and a curved screen, and a combination of a special-shaped screen and a curved screen, which is not limited in this embodiment. among them:
全面屏Full screen
全面屏可以是指触摸显示屏130占用终端100的前面板的屏占比超过阈值(比如80%或90%或95%)的屏幕设计。屏占比的一种计算方式为:(触摸显示屏130的面积/终端100的前面板的面积)*100%;屏占比的另一种计算方式为:(触摸显示屏130中实际显示区域的面积/终端100的前面板的面积)*100%;屏占比的再一种计算方式为:(触摸显示屏130的对角线/在终端100的前面板的对角线)*100%。示意性的如图3A所示的例子中,终端100的前面板上近乎所有区域均为触摸显示屏130,在终端100的前面板40上,除中框41所产生的边缘之外的其它区域,全部为触摸显示屏130。该触摸显示屏130的四个角可以是直角或者圆角。The full screen may refer to a screen design in which the touch screen display 130 occupies the front panel of the terminal 100 with a screen ratio exceeding a threshold (such as 80% or 90% or 95%). One calculation method of the screen ratio is: (the area of the touch display screen 130 / the area of the front panel of the terminal 100) * 100%; another calculation method of the screen ratio is: (the actual display area in the touch display screen 130) Area / the area of the front panel of the terminal 100) * 100%; another way to calculate the screen ratio is: (the diagonal of the touch screen 130 / the diagonal of the front panel of the terminal 100) * 100% . In the schematic example shown in FIG. 3A, almost all areas on the front panel of the terminal 100 are touch display screens 130. On the front panel 40 of the terminal 100, areas other than the edges generated by the middle frame 41 Are all touch display screens 130. The four corners of the touch display screen 130 may be right-angled or rounded.
全面屏还可以是将至少一种前面板部件集成在触摸显示屏130内部或下层的屏幕设计。可选地,该至少一种前面板部件包括:摄像头、指纹传感器、接近光传感器、距离传感器等。在一些实施例中,将传统终端的前面板上的其他部件集成在触摸显示屏130的全部区域或部分区域中,比如将摄像头中的感光元件拆分为多个感光像素后,将每个感光像素集成在触摸显示屏130中每个显示像素中的黑色区域中。由于将至少一种前面板部件集成在了触摸显示屏130的内部,所以全面屏具有更高的屏占比。The full screen may also be a screen design in which at least one front panel component is integrated inside or below the touch display screen 130. Optionally, the at least one front panel component includes a camera, a fingerprint sensor, a proximity light sensor, a distance sensor, and the like. In some embodiments, other components on the front panel of the conventional terminal are integrated in all or part of the touch display screen 130. For example, after the photosensitive element in the camera is divided into multiple photosensitive pixels, each photosensitive The pixels are integrated in a black area in each display pixel in the touch display screen 130. Since at least one front panel component is integrated inside the touch display screen 130, the full screen has a higher screen ratio.
当然在另外一些实施例中,也可以将传统终端的前面板上的前面板部件设置在终端100的侧边或背面,比如将超声波指纹传感器设置在触摸显示屏130的下方、将骨传导式的听筒设置在终端130的内部、将摄像头设置成位于终端的侧边且可插拔的结构。Of course, in other embodiments, the front panel components on the front panel of the traditional terminal can also be set on the side or back of the terminal 100. For example, an ultrasonic fingerprint sensor is set under the touch display screen 130, and a bone conduction type The handset is disposed inside the terminal 130, and the camera is disposed on the side of the terminal and is pluggable.
在一些可选的实施例中,当终端100采用全面屏时,终端100的中框的单个侧边,或两个侧边(比如左、右两个侧边),或四个侧边(比如上、下、左、右四个侧边)上设置有边缘触控传感器120,该边缘触控传感器120用于检测用户在中框上的触摸操作、点击操作、按压操作和滑动操作等中的至少一种操作。该边缘触控传感器120可以是触摸传感器、热力传感器、压力传感器等中的任意一种。用户可以在边缘触控传感器120上施加操作,对终端100中的应用程序进行控制。In some optional embodiments, when the terminal 100 adopts a full screen, a single side of the middle frame of the terminal 100, or two sides (such as left and right sides), or four sides (such as (Upper, lower, left, and right sides) are provided with edge touch sensors 120, which are used to detect the user's touch operations, click operations, press operations, and slide operations on the middle frame. At least one operation. The edge touch sensor 120 may be any one of a touch sensor, a thermal sensor, and a pressure sensor. The user can apply an operation on the edge touch sensor 120 to control an application program in the terminal 100.
曲面屏Curved screen
曲面屏是指触摸显示屏130的截面呈弯曲形状且沿平行于截面的方向上的投影为平面的屏幕设计,该弯曲形状可以是U型。可选地,曲面屏是指至少一个侧边是弯曲形状的屏幕设计方式。可选地,曲面屏是指触摸显示屏130的至少一个侧边延伸覆盖至终端100的中框上。由于触摸显示屏130的侧边延伸覆盖至终端100的中框,也即将原本不具有显示功能和触控功能的中框覆盖为可显示区域和/或可操作区域,从而使得曲面屏具有了更高的屏占比。可选地,如图3B所示的例子中,曲面屏是指左右两个侧边42是弯曲形状的屏幕设计;或者,曲面屏是指上下两个侧边是弯曲形状的屏幕设计;或者,曲面屏是指上、下、左、右四个侧边均为弯曲形状的屏幕设计。在可选的实施例中,曲面屏采用具有一定柔性的触摸屏材料制备。The curved screen refers to a screen design in which the cross-section of the touch display screen 130 has a curved shape and the projection is a plane in a direction parallel to the cross-section. The curved shape may be U-shaped. Optionally, the curved screen refers to a screen design manner in which at least one side is a curved shape. Optionally, the curved screen refers to that at least one side of the touch display screen 130 extends to cover the middle frame of the terminal 100. Since the side of the touch display screen 130 extends to the middle frame of the terminal 100, the middle frame without the display function and the touch function is also covered as a displayable area and / or an operable area, so that the curved screen has a more High screen ratio. Optionally, in the example shown in FIG. 3B, the curved screen refers to a screen design in which the left and right sides 42 are curved; or the curved screen refers to a screen design in which the upper and lower sides are curved; or, Curved screen refers to a screen design with four curved sides on the top, bottom, left, and right. In an alternative embodiment, the curved screen is made of a touch screen material with a certain flexibility.
异形屏Shaped screen
异形屏是外观形状为不规则形状的触摸显示屏,不规则形状不是矩形或圆角矩形。可选地,异形屏是指在矩形或圆角矩形的触摸显示屏130上设置有凸起、缺口和/或挖孔的屏幕设计。可选地,该凸起、缺口和/或挖孔可以位于触摸显示屏130的边缘、屏幕中央或两者均有。当凸起、缺口和/或挖孔设置在一条边缘时,可以设置在该边缘的中间位置或两端;当凸起、缺口和/或挖孔设置在屏幕中央时,可以设置在屏幕的上方区域、左上方区域、左侧区域、左下方区域、下方区域、右下方区域、右侧区域、右上方区域中的一个或多个区域中。当设置在多个区域中时,凸起、缺口和挖孔可以集中分布,也可以分散分布;可以对称分布,也可以不对称分布。可选地,该凸起、缺口和/或挖孔的数量也不限。The special-shaped screen is a touch display screen with an irregular appearance. The irregular shape is not a rectangle or a rounded rectangle. Optionally, the special-shaped screen refers to a screen design provided with protrusions, notches, and / or holes on the rectangular or rounded rectangular touch display screen 130. Optionally, the protrusion, the notch and / or the hole can be located at the edge of the touch display screen 130, the center of the screen, or both. When the protrusion, notch and / or hole is set on one edge, it can be set at the middle position or both ends of the edge; when the protrusion, notch and / or hole is set on the center of the screen, it can be set above the screen One or more of the region, the upper left region, the left region, the lower left region, the lower region, the lower right region, the right region, and the upper right region. When set in multiple areas, the protrusions, notches, and digging holes can be distributed in a centralized or distributed manner; they can be distributed symmetrically or asymmetrically. Optionally, the number of the protrusions, notches and / or holes is not limited.
由于异形屏将触摸显示屏的上额区和/或下额区覆盖为可显示区域和/或可操作区域,使得触摸显示屏在终端的前面板上占据更多的空间,所以异形屏也具有更大的屏占比。在一些实施例中,缺口和/或挖孔中用于容纳至少一种前面板部件,该前面板部件包括摄像头、指纹传感器、接近光传感器、距离传感器、听筒、环境光亮度传感器、物理按键中的至少一种。The special-shaped screen covers the upper and / or lower forehead area of the touch display screen as a displayable area and / or an operable area, so that the touch-screen display takes up more space on the front panel of the terminal. Larger screen ratio. In some embodiments, the notches and / or holes are used to receive at least one front panel component, which includes a camera, a fingerprint sensor, a proximity light sensor, a distance sensor, a handset, an ambient light sensor, and a physical button. At least one.
示例性的,该缺口可以设置在一个或多个边缘上,该缺口可以是半圆形缺口、直角矩形缺口、圆角矩形缺口或不规则形状缺口。示意性的如图3C所示的例子中,异形屏可以是在触摸显示屏130的上边缘的中央位置设置有半圆形缺口43的屏幕设计,该半圆形缺口43所空出的位置用于容纳摄像头、距离传感器(又称接近传感器)、听筒、环境光亮度传感器中的至少一种前面板部件;示意性的如图3D所示,异形屏可以是在触摸显示屏130的下边缘的中央位置设置有半圆形缺口44的屏幕设计,该半圆形缺口44所空出 的位置用于容纳物理按键、指纹传感器、麦克风中的至少一种部件;示意性的如图3E所示的例子中,异形屏可以是在触摸显示屏130的下边缘的中央位置设置有半椭圆形缺口45的屏幕设计,同时在终端100的前面板上还形成有一个半椭圆型缺口,两个半椭圆形缺口围合成一个椭圆形区域,该椭圆形区域用于容纳物理按键或者指纹识别模组;示意性的如图3F所示的例子中,异形屏可以是在触摸显示屏130中的上半部中设置有至少一个小孔46的屏幕设计,该小孔46所空出的位置用于容纳摄像头、距离传感器、听筒、环境光亮度传感器中的至少一种前面板部件。Exemplarily, the notch may be provided on one or more edges, and the notch may be a semicircular notch, a right-angled rectangular notch, a rounded rectangular notch, or an irregularly shaped notch. In the example shown in FIG. 3C schematically, the special-shaped screen may be a screen design provided with a semi-circular notch 43 at the center of the upper edge of the touch display screen 130, and the space vacated by the semi-circular notch 43 is used. At least one front panel component for accommodating a camera, a distance sensor (also known as a proximity sensor), a handset, and an ambient light brightness sensor; as shown schematically in FIG. 3D, the special-shaped screen may be on the lower edge of the touch display screen 130 The screen design of the semi-circular notch 44 is set at the center position, and the space vacated by the semi-circular notch 44 is used to accommodate at least one component of a physical button, a fingerprint sensor, and a microphone; as shown schematically in FIG. 3E In the example, the special-shaped screen may be a screen design provided with a semi-elliptical notch 45 at the center of the lower edge of the touch display screen 130, and a semi-elliptical notch is formed on the front panel of the terminal 100. The notch encloses an elliptical area, which is used to accommodate physical keys or fingerprint recognition modules. In the example shown schematically in Figure 3F, the special-shaped screen can be touched. The upper half of the display screen 130 is provided with a screen design of at least one small hole 46. The space vacated by the small hole 46 is used to accommodate at least one of the front panel of the camera, distance sensor, handset, and ambient light sensor. component.
除此之外,本领域技术人员可以理解,上述附图所示出的终端100的结构并不构成对终端100的限定,终端可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。比如,终端100中还包括射频电路、输入单元、传感器、音频电路、无线保真(Wireless Fidelity,WiFi)模块、电源、蓝牙模块等部件,在此不再赘述。In addition, those skilled in the art can understand that the structure of the terminal 100 shown in the above drawings does not constitute a limitation on the terminal 100. The terminal may include more or fewer components than shown, or a combination of some Components, or different component arrangements. For example, the terminal 100 further includes components such as a radio frequency circuit, an input unit, a sensor, an audio circuit, a wireless fidelity (WiFi) module, a power source, and a Bluetooth module, and details are not described herein again.
相关技术中,若用户想要了解视频中的人物或物品,则需要在两个应用程序之间进行切换,该过程所需的操作十分繁琐,效率低下。In the related art, if a user wants to know a character or an article in a video, he needs to switch between two applications. The operation required in this process is very tedious and inefficient.
基于此,本申请实施例提供了一种视频识图方法、装置、终端及存储介质。在本申请实施例提供的技术方案中,终端在视频播放场景下显示视频识图控件,若用户期望了解当前播放画面中的某一人物或物品,直接点击该视频识别控件,之后终端对当前播放画面进行图像识别,并将图像识别结果展示给用户,该过程可以避免用户在两个应用程序之间来回切换,节省用户了解当前播放画面中的某一人物或物品所需的操作,提升工作效率。Based on this, embodiments of the present application provide a method, a device, a terminal, and a storage medium for video image recognition. In the technical solution provided in the embodiment of the present application, the terminal displays a video recognition control in a video playback scene. If the user desires to know a certain character or item in the current playback screen, directly click the video recognition control, and then the terminal displays the current playback The screen performs image recognition and displays the image recognition results to the user. This process can prevent the user from switching back and forth between the two applications, saving the user the operation required to understand a character or item in the currently playing screen, and improving work efficiency. .
在本申请实施例中,各步骤的执行主体可以是上文实施例介绍的终端。其中,该终端具有视频播放功能。可选地,终端还具有图像识别功能。在本申请的一些实施例中,终端中安装运行有用于实现视频播放功能的应用程序,各步骤的执行主体可以是该应用程序,该应用程序可以是系统应用或者第三方应用。为了便于说明,在下述方法实施例中,仅以各步骤的执行主体是终端为例进行介绍说明,但对此不构成限定。In the embodiment of the present application, the execution subject of each step may be a terminal described in the foregoing embodiment. The terminal has a video playing function. Optionally, the terminal also has an image recognition function. In some embodiments of the present application, an application for implementing a video playback function is installed and run in the terminal, and the execution subject of each step may be the application, and the application may be a system application or a third-party application. For ease of description, in the following method embodiments, only the execution subject of each step is a terminal is used as an example for description, but this is not a limitation.
请参考图4,其示出了本申请一个实施例提供的视频识图方法的流程图。该方法可以包括如下步骤:Please refer to FIG. 4, which shows a flowchart of a video image recognition method provided by an embodiment of the present application. The method may include the following steps:
步骤401,在处于视频播放场景时,在侧边栏显示识图功能控件。Step 401: When in a video playback scene, display a picture recognition function control in a sidebar.
视频播放场景是指终端正在播放视频的场景。在一种可能的实现方式中,终端通过播放应用程序来播放视频;在另一种可能的实现方式中,终端通过浏览器来播放网页中的视频。A video playing scene refers to a scene where a terminal is playing a video. In a possible implementation manner, the terminal plays a video by playing an application program; in another possible implementation manner, the terminal plays a video in a webpage through a browser.
侧边栏用于显示终端中的应用图标和/或功能控件,以使得终端在前台运行应用程序的过程中,能够便捷地打开其它应用程序,或者,执行功能控件所对应的功能。侧边栏中所显示的应用图标和/或功能控件可以由终端默认设置,也可以由用户自定义设置。在本申请实施例中,侧边栏中包括识图功能控件。The sidebar is used to display application icons and / or function controls in the terminal, so that the terminal can conveniently open other applications or execute functions corresponding to the function controls while the terminal is running the application in the foreground. The application icons and / or function controls displayed in the sidebar can be set by the terminal by default or can be customized by the user. In the embodiment of the present application, the image recognition function control is included in the sidebar.
识图功能控件用于触发对当前播放视频中的画面进行图像识别。识别功能控件可以在视频开始播放的时候显示,也可以根据用户所触发的操作信号进行显示,本申请实施例对识图功能控件的显示时机不作限定。The image recognition function control is used to trigger image recognition of the picture in the currently playing video. The identification function control may be displayed when the video starts to play, or may be displayed according to an operation signal triggered by a user. The embodiment of the present application does not limit the display timing of the identification function control.
当识图功能控件时根据用户所触发的操作信号进行显示时,步骤401可以包括如下两个子步骤:When the display function control is displayed according to the operation signal triggered by the user, step 401 may include the following two sub-steps:
步骤401a,在处于视频播放场景时,接收对应于侧边栏的呼出指令;Step 401a, when in a video playing scene, receive an outgoing call instruction corresponding to a sidebar;
步骤401b,根据呼出指令显示侧边栏。In step 401b, a sidebar is displayed according to the outgoing call instruction.
呼出指令用于呼出侧边栏。可选地,终端的显示界面中显示有浮标,若接收到作用在浮标上的触发信号,则终端接收到呼出指示。The call out command is used to call out the sidebar. Optionally, a buoy is displayed on the display interface of the terminal, and if a trigger signal acting on the buoy is received, the terminal receives an outgoing call instruction.
浮标可以始终显示在显示界面上层,也可以在应用程序启动运行时显示在显示界面上层,还可以根据用户所触发的操作信号显示在显示界面上层,本申请实施例对浮标的显示时机不作限定。浮标的形状可以是圆形、椭圆形、矩形等,本申请实施例对浮标的形状不作限定。浮标的面积可以由终端默认设置,也可以由用户自定义设定,本申请实施例对此不作限定。另外,为了尽可能地减少对显示界面的遮挡,浮标可以设置透明度大于0。The buoy can always be displayed on the upper layer of the display interface, or can be displayed on the upper layer of the display interface when the application is started and run, and can also be displayed on the upper layer of the display interface according to the operation signal triggered by the user. The embodiment of the present application does not limit the display timing of the buoy. The shape of the buoy may be a circle, an oval, a rectangle, or the like, and the shape of the buoy is not limited in the embodiment of the present application. The area of the buoy can be set by the terminal by default, or can be set by the user, which is not limited in the embodiment of the present application. In addition, in order to reduce the occlusion of the display interface as much as possible, the buoy can be set to a transparency greater than 0.
作用在浮标上的触发信号可以是单击信号、双击信号、长按信号、滑动信号、拖动信号中的任意一种,本申请实施例对此不作限定。在本申请实施例中仅以作用在浮标上的触发信号是滑动信号为例进行说明。The trigger signal acting on the buoy may be any one of a click signal, a double-click signal, a long-press signal, a slide signal, and a drag signal, which is not limited in the embodiment of the present application. In the embodiment of the present application, the trigger signal acting on the buoy is a slide signal as an example for description.
另外,终端处于横屏显示状态时,浮标会对显示界面造成遮挡,此时用户的沉浸感降低。为了避免该种情况的发生,在本申请的一些实施例中,终端在接收在显示侧边上的触发信号时,接收呼出指令。示例性地,在显示侧边上的触发信号是以显示侧边的外侧到内侧的滑动信号。In addition, when the terminal is in the horizontal screen display state, the buoy will block the display interface, and the user's immersion is reduced at this time. In order to avoid this situation, in some embodiments of the present application, when the terminal receives a trigger signal on the display side, it receives an outgoing call instruction. Exemplarily, the trigger signal on the display side is a sliding signal from the outside to the inside of the display side.
结合参考图5,其示出了本申请一个实施例示出的显示视频识图控件的界面示意图。在处于视频播放场景时,用户对终端的左侧边执行由外侧至内侧的滑动操作,终端接收到滑动操作信号后显示侧边栏51,侧边栏51中包含识图功能控件52。With reference to FIG. 5, a schematic diagram of an interface for displaying a video image recognition control according to an embodiment of the present application is shown. When in the video playback scene, the user performs a sliding operation from the outside to the inside of the left side of the terminal. After receiving the sliding operation signal, the terminal displays a sidebar 51, and the sidebar 51 includes a picture recognition control 52.
步骤402,在接收到对应于识图功能控件的第一触发信号时,对当前播放画面进行截图处理,得到目标图像。In step 402, when a first trigger signal corresponding to the image recognition function control is received, screenshot processing is performed on the current playback screen to obtain a target image.
第一触发信号由用户进行触发,其可以是单击信号、双击信号、长按信号、滑动信号、拖动信号中的任意一种。在本申请实施例中,仅以第一触发信号为单击信号为例进行说明。目标图像是需要识别的图像。可选地,终端将接收到第一触发信号时所显示的播放画面确定为目标图像。The first trigger signal is triggered by the user, and may be any one of a click signal, a double-click signal, a long-press signal, a slide signal, and a drag signal. In the embodiment of the present application, the first trigger signal is a click signal as an example for description. The target image is an image that needs to be identified. Optionally, the terminal determines the playback picture displayed when the first trigger signal is received as the target image.
在本申请实施例中,目标图像需要展示给用户,以便用户确定是否是需要识别的图像。在本申请实施例中,通过截图处理的方式来获取目标图像。截图处理是指截取当前播放画面,并将截取到的播放画面确定为目标图像。In the embodiment of the present application, the target image needs to be displayed to the user, so that the user determines whether it is an image that needs to be identified. In the embodiment of the present application, the target image is acquired by a screenshot processing method. Screenshot processing refers to capturing the current playback frame and determining the captured playback frame as the target image.
在一种可能的实现方式中,终端对完整的当前播放画面进行截图处理,得到目标图像。在另一种可能的实现方式中,终端对当前播放画面中的部分画面进行截图处理,得到目标图像。上述部分画面可以由用户选定。可选地,在接收到对应于识图功能控件的第一触发信号时,暂停视频播放并提示用户截取目标图像,用户对当前播放画面执行拖动操作,之后终端截取对角线为上述拖动操作信号的起点至终点的直线的矩形区域,作为目标图像。In a possible implementation manner, the terminal performs screenshot processing on a complete current playback picture to obtain a target image. In another possible implementation manner, the terminal performs a screenshot process on a part of the pictures currently being played to obtain a target image. Some of the above screens can be selected by the user. Optionally, upon receiving the first trigger signal corresponding to the image recognition function control, the video playback is paused and the user is prompted to capture the target image, the user performs a drag operation on the current playback screen, and then the terminal intercepts the diagonal line as the drag A rectangular area of a straight line from the start point to the end point of the operation signal is used as the target image.
终端在获取目标图像之后,还可以显示该目标图像。可选地,终端在悬浮窗上显示该目标图像。由于悬浮窗的尺寸较小,因此在悬浮窗上显示该目标图像时,需要对目标图像进行尺寸缩小处理。After the terminal acquires the target image, it can also display the target image. Optionally, the terminal displays the target image on a floating window. Because the size of the floating window is small, when the target image is displayed on the floating window, the target image needs to be reduced in size.
步骤403,获取目标图像的图像识别结果。Step 403: Obtain an image recognition result of the target image.
图像识别结果是对目标图像进行图像识别得到的。可选地,图像识别结果可以包括至少一条记录,每条记录用于表示目标图像中的一个元素的识别结果,其可以是人物标识,也可以是物品标识。人物标识用于唯一标识人物,其可以是人物的姓名,终端识别出了当前播放画面中的人物,得到人物标识。物品标识用于唯一标识物品,其可以是物品的名称,终端识别出了当前播放画面中的物品,得到物品标识。另外,图像识别结果还包括每条记录对应的相似度,该相似度是指该记录与目标图像中相应的元素之间的相似度,用于衡量图像识别结果的准确度。其中,相似度越高,则该图像识别结果越准确;相似度越低,则该图像识别结果越不准确。The image recognition result is obtained by performing image recognition on the target image. Optionally, the image recognition result may include at least one record, and each record is used to represent a recognition result of an element in the target image, which may be a person identification or an item identification. The character identification is used to uniquely identify the person, and it may be the name of the person. The terminal recognizes the person in the current playback screen and obtains the character identification. The item identifier is used to uniquely identify the item, which may be the name of the item. The terminal recognizes the item in the currently playing screen and obtains the item identification. In addition, the image recognition result also includes the similarity corresponding to each record. The similarity refers to the similarity between the record and the corresponding element in the target image, and is used to measure the accuracy of the image recognition result. The higher the similarity, the more accurate the image recognition result; the lower the similarity, the less accurate the image recognition result is.
在第一种可能的实现方式中,由终端对目标图像进行识别,得到图像识别结果。在第二种可能的实现方式中,由服务器对目标图像进行图像识别,得到图像识别结果,之后终端从服务器中获取图像识别结果。具体地,终端向服务器发送识别请求,识别请求中携带终端的标识和目标图像,服务器根据该识别请求对目标图像进行识别,得到图像识别结果,并向终端返回图像识别结果。在本申请实施例中,仅以第一种可能的实现方式为例进行讲解。In a first possible implementation manner, the terminal recognizes a target image to obtain an image recognition result. In a second possible implementation manner, the server performs image recognition on the target image to obtain an image recognition result, and then the terminal obtains the image recognition result from the server. Specifically, the terminal sends an identification request to the server, and the identification request carries the identification of the terminal and the target image, and the server recognizes the target image according to the identification request, obtains an image recognition result, and returns the image recognition result to the terminal. In the embodiment of the present application, only the first possible implementation manner is taken as an example for explanation.
本申请实施例对图像识别所采取的算法不作限定,其可以是基于模型匹配的图像识别算法、基于神经网络的图像识别算法、基于小波矩的图像识别算法、基于分形特征的图像识别算法等等,本申请实施例对此不作限定。The embodiment of the present application does not limit the algorithm used for image recognition. It may be an image recognition algorithm based on model matching, an image recognition algorithm based on neural networks, an image recognition algorithm based on wavelet moments, an image recognition algorithm based on fractal features, and so on. This is not limited in the embodiments of the present application.
可选地,终端在悬浮窗上显示目标图像后,还可以显示询问信息,该询问信息用于询问是否需要获取目标图像的图像识别结果,在接收到对应于所述询问信息的确认指示时,执行获取目标图像的图像识别结果的步骤。Optionally, after the terminal displays the target image on the floating window, the terminal may further display query information, where the query information is used to query whether it is necessary to obtain an image recognition result of the target image. When receiving a confirmation instruction corresponding to the query information, Perform the step of obtaining an image recognition result of the target image.
步骤404,显示图像识别结果。Step 404: Display the image recognition result.
终端在获取到图像识别结果之后,显示该图像识别结果,以便用户查看。可选地,图像识别结果也显示在步骤402中提及的悬浮窗中。After the terminal obtains the image recognition result, the terminal displays the image recognition result for the user to view. Optionally, the image recognition result is also displayed in the floating window mentioned in step 402.
综上所述,本申请实施例提供的技术方案,通过在视频播放场景下显示视频识图控件,若用户期望了解当前播放画面中的某一人物或物品,直接点击该视频识别控件,之后终端对当前播放画面进行图像识别,并将图像识别结果展示给用户,该过程可以避免用户在两个应用程序之间来回切换,节省用户了解当前播放画面中的某一人物或物品所需的操作,操作更加便捷且识图效率更高。In summary, the technical solution provided in the embodiments of the present application, by displaying a video recognition control in a video playback scene, if the user desires to know a certain character or item in the current playback screen, directly click the video recognition control, and then the terminal Perform image recognition on the current playback screen and display the image recognition results to the user. This process can prevent users from switching back and forth between the two applications, saving the user the operation required to understand a character or item in the current playback screen. The operation is more convenient and the recognition efficiency is higher.
由于一幅图像中可能包括多个对象,例如人物、物品、动物、花、树等等,若用户只需要了解部分元素,而终端依然对整幅图像进行图像识别,则可能导致识别效率较低。在本申请实施例中,由用户选择出上述多个对象中的待识别对象,之后终端只获取该待识别对象的图像识别结果,而无需获取整幅图像的图像识别结果,可以提升识别效率。在基于图4所示实施例提供的一个可选实施例中,目标图像中包括多个待识别对象,步骤403包括如下两个子步骤:Because an image may include multiple objects, such as people, objects, animals, flowers, trees, etc., if the user only needs to understand some elements, and the terminal still performs image recognition on the entire image, it may result in low recognition efficiency. . In the embodiment of the present application, the user selects an object to be identified among the multiple objects, and then the terminal obtains only the image recognition result of the object to be recognized, without acquiring the image recognition result of the entire image, which can improve the recognition efficiency. In an optional embodiment provided based on the embodiment shown in FIG. 4, the target image includes multiple objects to be identified, and step 403 includes the following two sub-steps:
步骤501,确定目标图像中包含的目标待识别对象。Step 501: Determine a target to-be-recognized object included in the target image.
目标待识别对象是指用户期望识别的对象,其可以由用户选择。目标待识别对象的数量可以是一个,也可以是多个,目标待识别对象的数量可以小于目标图像所包含的对象的数量,也可以等于目标图像所包含的对象的数量。下面将对确定目标待识别对象的三种实现方式分别进行讲解。The target to-be-recognized object refers to an object that the user desires to recognize, which can be selected by the user. The number of target to-be-recognized objects may be one or multiple. The number of target to-be-recognized objects may be less than the number of objects contained in the target image, or may be equal to the number of objects contained in the target image. The three implementation methods for determining the target object to be identified are explained separately below.
在第一种可能的实现方式中,步骤501包括如下几个子步骤:In a first possible implementation manner, step 501 includes the following sub-steps:
步骤501a,显示人物识别控件和/或物品识别控件;Step 501a, displaying a person identification control and / or an item identification control;
人物识别控件用于触发识别目标图像中包含人物图像的区域,物品识别控件用于触发识别目标图像中包含物品图像的区域。可选地,终端在显示目标图像的同时,显示上述人物识别控件和/或物品识别控件。可选地,上述人物识别控件和/或物品识别控件也显示在悬浮窗内。The person recognition control is used to trigger the recognition of the area containing the person image in the target image, and the item recognition control is used to trigger the recognition of the area containing the object image in the target image. Optionally, the terminal displays the above-mentioned person recognition control and / or item recognition control while displaying the target image. Optionally, the above-mentioned person recognition control and / or item recognition control are also displayed in the floating window.
步骤501b,当接收到对应于人物识别控件的第二触发信号时,确定目标待识别对象为目标图像中包含人物图像的区域;In step 501b, when a second trigger signal corresponding to the person recognition control is received, it is determined that the target object to be identified is an area including a person image in the target image;
目标图像中包含人物图像的区域可以是包含人脸图像的矩形区域。进一步地,目标图像中包含人物图像的区域是包含人脸图像的最小矩形区域。The area containing the person image in the target image may be a rectangular area containing a face image. Further, the area containing the person image in the target image is the smallest rectangular area containing the face image.
步骤501c,当接收到对应于物品识别控件的第三触发信号时,确定目标待识别对象为目标图像中包含物品图像的区域。In step 501c, when a third trigger signal corresponding to the item identification control is received, it is determined that the target object to be identified is an area in the target image that includes the item image.
目标图像中包含物品图像的区域可以是包含整个物品的区域,也可以是包含物品的关键特征的矩形区域。物品的关键特征可以根据物品实际确定。例如,物品为花时,其关键特征为花瓣。进一步地,目标图像中包含物品图像的区域可以是包含整个物品的最小矩形区域,也可以是包含物品的关键特征的最小矩形区域。The area containing the image of the article in the target image may be the area containing the entire article or a rectangular area containing the key features of the article. The key characteristics of the item can be determined based on the actual item. For example, when the item is a flower, its key feature is a petal. Further, the area containing the image of the article in the target image may be the smallest rectangular area containing the entire article, or the smallest rectangular area containing key features of the article.
结合参考图6,其示出了本申请一个实施例提供的确定待识别对象的界面示意图。终端在悬浮窗61上显示目标图像62、人物识别控件63、物品识别控件64。当用户点击人物识别控件63时,终端确定目标待识别对象为目标图像中包含人物图像的区域;当用户点击物品识别控件64时,终端确定目标待识别对象为目标图像中包含物品图像的区域。With reference to FIG. 6, a schematic diagram of an interface for determining an object to be identified is provided according to an embodiment of the present application. The terminal displays a target image 62, a person recognition control 63, and an article recognition control 64 on the floating window 61. When the user clicks the person recognition control 63, the terminal determines that the target to-be-recognized object is a region containing the person image in the target image; when the user clicks the item recognition control 64, the terminal determines that the target to-be-recognized object is an area that includes the object image in the target image.
在第二种可能的实现方式中,步骤501包括如下几个子步骤:In a second possible implementation manner, step 501 includes the following sub-steps:
步骤501d,显示目标图像;Step 501d, displaying a target image;
目标图像中的各个待识别对象标注有不同序号。可选地,终端还在目标图像的下方显示上述不同序号。Each object to be identified in the target image is labeled with a different serial number. Optionally, the terminal also displays the above-mentioned different serial numbers below the target image.
步骤501e,接收对应于目标序号的选择信号;Step 501e: Receive a selection signal corresponding to the target sequence number;
对应于目标序号的选择信号可以是单击信号、双击信号、长按信号、滑动信号、拖动信号中的任意一种,本申请实施例对此不作限定。在本申请实施例中,仅以对应于目标序号的选择信号是单击信号为例进行说明。The selection signal corresponding to the target number may be any one of a click signal, a double-click signal, a long-press signal, a slide signal, and a drag signal, which is not limited in this embodiment of the present application. In the embodiment of the present application, the selection signal corresponding to the target sequence number is a click signal as an example for description.
目标序号是被选择的序号。若用户期望了解某一对象,则可以选择该对象对应的序号。若终端还在目标图像的下方显示上述不同序号,则终端可以在目标图像中选择目标序号,也可以在目标图像的下方显示的序号中选择目标序号。The target sequence number is the selected sequence number. If the user wants to know about an object, he can select the serial number corresponding to the object. If the terminal also displays the different serial numbers above the target image, the terminal may select the target serial number in the target image, or select the target serial number among the serial numbers displayed below the target image.
步骤501f,将目标序号对应的待识别对象确定为目标待识别对象。In step 501f, the object to be identified corresponding to the target number is determined as the target object to be identified.
终端将被选择的序号对应的对象确定为目标待识别对象。可选地,悬浮窗中还包括完成控件,当终端接收到对应于该完成控件的确认指示时,将被选择的序号对应的对象确定为待识别对象。The terminal determines the object corresponding to the selected serial number as the target object to be identified. Optionally, the floating window further includes a completion control, and when the terminal receives a confirmation instruction corresponding to the completion control, the object corresponding to the selected serial number is determined as the object to be identified.
结合参考图7,其示出了本申请一个实施例提供的确定待识别对象的界面示意图。终端在悬浮窗61上显示目标图像62与完成控件71,该目标图像62中的各个待识别对象被标识了不同序号,当用户点击某一序号与完成控件71之后,终端将该序号对应的待识别对象确定为目标待识别对象。With reference to FIG. 7, a schematic diagram of an interface for determining an object to be identified is provided according to an embodiment of the present application. The terminal displays a target image 62 and a completion control 71 on the floating window 61. Each to-be-recognized object in the target image 62 is identified with a different serial number. When the user clicks a certain serial number and the completion control 71, the terminal corresponding to the serial number The recognition object is determined as the target to-be-recognized object.
在第三种可能的实现方式中,步骤501包括如下几个子步骤:In a third possible implementation manner, step 501 includes the following sub-steps:
步骤501g,显示目标图像;Step 501g, displaying a target image;
步骤501h,接收作用在目标图像上的第三触发信号;Step 501h, receiving a third trigger signal acting on the target image;
第三触发信号可以是单击信号、双击信号、长按信号、滑动信号、拖动信号中的任意一种,本申请实施例对此不作限定。步骤501i,将第三触发信号对应的目标区域内的待识别对象确定为目标待识别对象。The third trigger signal may be any one of a click signal, a double-click signal, a long-press signal, a slide signal, and a drag signal, which is not limited in the embodiment of the present application. In step 501i, the object to be identified in the target area corresponding to the third trigger signal is determined as the target object to be identified.
当第三触发信号为单击信号、双击信号、长按信号中的任意一种时,第三触发信号对应的目标区域是指以第三触发信号的触发位置为中心,且面积为预设面积的区域。第三触发信号的触发位置是指用户手指与显示屏之间的接触位置。预设面积可以根据实际经验设定,本申请实施例对此不作限定。当第三触发信号为滑动信号或拖动信号时,第三触发信号对应的目标区域是以第三触发信号的运动轨迹为对角线的矩形区域。When the third trigger signal is any one of a click signal, a double-click signal, and a long-press signal, the target area corresponding to the third trigger signal refers to the trigger position of the third trigger signal as the center and the area is a preset area Area. The trigger position of the third trigger signal refers to a contact position between the user's finger and the display screen. The preset area can be set according to actual experience, which is not limited in the embodiment of the present application. When the third trigger signal is a sliding signal or a drag signal, the target area corresponding to the third trigger signal is a rectangular area with the motion track of the third trigger signal as a diagonal.
可选地,悬浮窗中还包括完成控件,当终端接收到对应于该完成控件的确认指示时,将处于第三触发信号对应的目标区域内的对象确定为目标待识别对象。Optionally, the floating window further includes a completion control, and when the terminal receives a confirmation instruction corresponding to the completion control, the object in the target area corresponding to the third trigger signal is determined as the target object to be identified.
结合参考图8,其示出了本申请另一个实施例提供的确定待识别对象的界面示意图。终端在悬浮窗61上显示目标图像62,该用户点击某一位置时,终端将以该位置为中心,面积为预设面积的区域81内的待识别对象确定为目标待识别对象。With reference to FIG. 8, a schematic diagram of an interface for determining an object to be identified is provided according to another embodiment of the present application. The terminal displays a target image 62 on the floating window 61. When the user clicks on a position, the terminal determines the target object to be identified in the area 81 centered on the position and having an area of a preset area.
步骤302,对目标待识别对象进行图像识别,得到图像识别结果。Step 302: Perform image recognition on the target to-be-recognized object to obtain an image recognition result.
可选地,步骤302可以实现为:通过机器学习模型对目标待识别对象进行图像识别,得到图像识别结果。Optionally, step 302 may be implemented as: performing image recognition on the target object to be recognized through a machine learning model to obtain an image recognition result.
机器学习模型是采用多组训练样本数据对神经网络训练得到的。多组训练样本数据中的每组训练样本数据包括样本图像,以及样本图像对应的识别结果。样本图像对应的识别结果可以通过人工方式来获取,也即由相关技术人员确定样本图像对应的识别结果并进行记录。A machine learning model is obtained by training a neural network using multiple sets of training sample data. Each set of training sample data in the plurality of sets of training sample data includes a sample image and a recognition result corresponding to the sample image. The recognition result corresponding to the sample image can be obtained manually, that is, the relevant technician determines the recognition result corresponding to the sample image and records it.
神经网络可以是卷积神经网络(Convolutional Neural Network,CNN)、人工神经网络(Artificial Neural Network,ANN)、深度神经网络(Deep Neural Networks,DNN)等,本申请实施例对此不作限定。The neural network may be a Convolutional Neural Network (CNN), an Artificial Neural Network (ANN), a Deep Neural Networks (DNN), and the like, which are not limited in the embodiments of the present application.
训练机器学习模型时采用的机器学习算法可以是反向传播算法(Back-Propagation,BP)、更快的区域卷积神经网络faster RCNN(faster Regions with Convolutional Neural Network,faster RCNN)算法等,本申请实施例对此不作限定。The machine learning algorithm used in training the machine learning model can be a back-propagation (BP) algorithm, a faster regional convolutional neural network faster RCNN (faster Regions with Convolutional Neural Network, faster RCNN) algorithm, etc., this application The embodiment is not limited thereto.
可选地,机器学习模型包括:一个输入层、至少一个隐层、和一个输出层。输入层的输入数据为目标图像,或者目标图像中的目标待识别对象,输出层的输出结果是该目标图像的图像识别结果。确定过程如下:将目标图像,或者目标图像中的待识别对象输入至机器学习模型的输入层,由机器学习模型的隐层对上述特征数据进行特征提取,并对提取到的特征进行组合和抽象,最后由输出层输出该目标图像的图像识别结果。另外,在本申请实施例中,对隐层的具体结构不作限定。一般来说,神经网络的层数越多,效果越好但计算时间也会越长,在实际应用中,可结合精度要求,设计适当层数的神经网络。Optionally, the machine learning model includes: an input layer, at least one hidden layer, and an output layer. The input data of the input layer is the target image or the target object to be identified in the target image, and the output result of the output layer is the image recognition result of the target image. The determination process is as follows: the target image or the object to be identified in the target image is input to the input layer of the machine learning model, and the hidden layer of the machine learning model performs feature extraction on the above feature data, and combines and abstracts the extracted features , And finally the image recognition result of the target image is output by the output layer. In addition, in the embodiment of the present application, the specific structure of the hidden layer is not limited. Generally speaking, the more layers of a neural network, the better the effect but the longer the calculation time. In practical applications, a neural network with an appropriate number of layers can be designed in accordance with the accuracy requirements.
另外,机器学习模型的训练过程如下:获取初始机器学习模型,将样本训练数据中的样本图像输入至上述初始机器学习模型,由初始机器学习模型输出该样本图像对应的实际识别结果,将该实际识别结果与该样本图像对应的图像识别结果进行比对,得到计算损失,之后将计算损失与预设阈值进行比对,若计算损失大于预设阈值,则更新初始机器学习模型的各项参数,并从将样本训练数据中的样本图像输入至上述初始机器学习模型的步骤重新开始执行,若计算损失不大于预设阈值,则生成机器学习模型。其中,预设阈值可以根据识别精度实际确定,本申请实施例对此不作限定。In addition, the training process of the machine learning model is as follows: obtaining the initial machine learning model, inputting the sample images in the sample training data to the initial machine learning model, and outputting the actual recognition results corresponding to the sample images from the initial machine learning model, The recognition result is compared with the image recognition result corresponding to the sample image to obtain the calculated loss, and then the calculated loss is compared with a preset threshold. If the calculated loss is greater than the preset threshold, the parameters of the initial machine learning model are updated. Then, the steps of inputting the sample images in the sample training data to the initial machine learning model are restarted. If the calculation loss is not greater than a preset threshold, a machine learning model is generated. Wherein, the preset threshold may be determined actually according to the recognition accuracy, which is not limited in the embodiment of the present application.
综上所述,本申请实施例提供的技术方案,通过先由用户在待识别图像中确定出待识别的人物或物品,后续图像识别时无需对整张图像进行图像识别,而只需要对用户选择出的人物或物品进行识别,可以提升识图效率。In summary, the technical solution provided in the embodiments of the present application allows the user to first identify the person or article to be identified in the image to be identified, and does not need to perform image recognition on the entire image during subsequent image recognition. The selected people or objects for identification can improve the efficiency of image recognition.
在获取到图像识别结果之后,终端还可以获取并显示图像识别结果对应的相关信息,以便用户了解到播放画面中的人物或者物品的更丰富、全面的信息。在基于图4所示实施例提供的一个可选实施例中,在步骤403之后,该视频识图方法还可以包括如下步骤:After the image recognition result is obtained, the terminal may also obtain and display related information corresponding to the image recognition result, so that the user can know more abundant and comprehensive information about the person or article in the playback screen. In an optional embodiment provided based on the embodiment shown in FIG. 4, after step 403, the video image recognition method may further include the following steps:
步骤601,获取图像识别结果对应的相关信息;Step 601: Obtain related information corresponding to the image recognition result.
当图像识别结果为人物标识时,图像识别结果对应的相关信息包括以下一项或多项的组合:人物标识对应的人物的百科信息、社交帐号信息、新闻资讯信息、作品信息。When the image recognition result is a person identification, the related information corresponding to the image recognition result includes one or more of the following combinations: encyclopedia information, social account information, news information information, and work information of the person corresponding to the person identification.
百科信息是指该人物的详细资料信息,其通常包括姓名、年龄、职业、生日等等。社交帐号信息包括该人物所使用的社交帐号的网页链接,当该网页链接被点击时,终端显示该社交帐号的主页面,以便用户通过自身所使用的社交帐号来与其建立社交关系,上述社交关系可以是关注关系、收听关系、好友关系等。新闻资讯信息是指与该人物相关的新闻资讯。作品信息包括该人物出演过的作品的详细介绍,以及访问链接。Encyclopedia information refers to the detailed information of the person, which usually includes name, age, occupation, birthday, and so on. The social account information includes a web page link of the social account used by the character. When the web page link is clicked, the terminal displays the main page of the social account so that the user can establish a social relationship with the social account by using the social account. It can be concern relationship, listening relationship, friend relationship, etc. News information refers to news information related to the person. The work information includes a detailed introduction to the work in which the character has appeared, and a link to visit.
当图像识别结果为物品标识时,图像识别结果对应的相关信息包括以下一项或多项的组合:物品标识对应的物品的百科信息、购买信息。When the image recognition result is an item identification, the related information corresponding to the image recognition result includes one or more of the following combinations: encyclopedia information and purchase information of the item corresponding to the item identification.
百科信息是指该物品的详细资料信息,其可以包括该物品的名称、材质、重量等等。购买信息包括该物品的购买链接,当该购买链接被点击时,终端显示该物品的购买页面,以便用户购买该物品。Encyclopedia information refers to the detailed information of the item, which can include the name, material, weight, etc. of the item. The purchase information includes a purchase link for the item. When the purchase link is clicked, the terminal displays a purchase page for the item so that the user can purchase the item.
在第一种可能的实现方式中,由终端在本地获取图像识别结果的相关信息。在第二种可能的实现方式中,由终端从服务器中获取图像识别结果的相关信息。具体地,终端向服务器发送获取请求,识别请求中携带终端的标识和图像识别结果,服务器根据该获取请求获取图像识别结果对应的相关信息,并向终端返回该相关信息。在本申请实施例中,仅以第二种可能的实现方式为例进行讲解。In a first possible implementation manner, the terminal acquires the related information of the image recognition result locally. In a second possible implementation manner, the terminal obtains related information of the image recognition result from the server. Specifically, the terminal sends an acquisition request to the server, and the identification request carries the identification of the terminal and the image recognition result. The server obtains related information corresponding to the image recognition result according to the acquisition request, and returns the related information to the terminal. In the embodiment of the present application, only the second possible implementation manner is used as an example for explanation.
步骤602,显示图像识别结果对应的相关信息。Step 602: Display related information corresponding to the image recognition result.
若图像识别结果中包括一条记录,则终端直接跳转显示图像识别结果对应的相关信息。在其它可能的实现方式中,终端在显示图像识别结果的同时显示跳转控件,当终端接收到对应于该跳转控件的触发信号时,显示图像识别结果对应的相关信息。If the image recognition result includes a record, the terminal directly jumps to display related information corresponding to the image recognition result. In other possible implementations, the terminal displays a jump control while displaying the image recognition result, and when the terminal receives a trigger signal corresponding to the jump control, displays related information corresponding to the image recognition result.
若图像识别结果中包括多条记录,则终端显示每条记录对应的跳转控件,当用户接收到对应于目标跳转控件的触发信号时,终端显示目标跳转控件对应的记录的相关信息。If the image recognition result includes multiple records, the terminal displays the jump control corresponding to each record. When the user receives a trigger signal corresponding to the target jump control, the terminal displays related information of the record corresponding to the target jump control.
可选地,终端在显示图像识别结果对应的相关信息时,还显示有收藏控件。当终端接收到对应于收藏控件的触发信号时,保存该图像识别结果对应的相关信息。另外,该收藏控件会转变成已收藏状态。在一种可能的实现方式中,终端将上述相关信息直接保存在第一存储路径下,后续用户在不存在网络连接的情况下可以直接查看该相关信息,减少流量消耗。在另一种可能的实现方式中,终端将上述相关信息对应的访问地址存储在第二存储路径下,后续用户可以通过该访问地址重新获取并查看相关信息,减少终端存储空间的占用。第一存储路径和第二存储路径可以由用户自定义设定,也可以由终端默认设定,本申请实施例对此不作限定。另外,当相关信息包括多项时,每项均对应一个收藏控件,以便用户能够有选择地保存自己所需的相关信息。Optionally, when the terminal displays related information corresponding to the image recognition result, it also displays a favorite control. When the terminal receives the trigger signal corresponding to the favorite control, the terminal saves relevant information corresponding to the image recognition result. In addition, the favorite control will change to the favorited state. In a possible implementation manner, the terminal directly stores the foregoing related information in the first storage path, and subsequent users can directly view the related information when there is no network connection, thereby reducing traffic consumption. In another possible implementation manner, the terminal stores the access address corresponding to the related information in the second storage path, and subsequent users can obtain and view related information again through the access address, thereby reducing the storage space occupation of the terminal. The first storage path and the second storage path may be set by a user, or may be set by a terminal by default, which is not limited in the embodiment of the present application. In addition, when the related information includes multiple items, each item corresponds to a favorite control, so that the user can selectively save the related related information that he needs.
结合参考图9,其示出了本申请一个实施例提供的显示相关信息的界面示意图。终端在悬浮窗61中显示目标图像62、人物识别控件63和物体识别控件64,当用户点击该人物识别控件63时,终端在悬浮窗61中显示图像识别结果中的第一条记录91“人物A”、第一条记录91“人物A”对应的跳转控件92、第二条记录93“人物B”以及第二条记录93“人物B”对应的跳转控件94;当用户点击第一条记录91“人物A”对应的跳转控件92时,悬浮窗61显示第一条记录91“人物A”对应的相关信息95,以及收藏控件96。With reference to FIG. 9, it illustrates a schematic diagram of an interface for displaying related information provided by an embodiment of the present application. The terminal displays the target image 62, the person recognition control 63, and the object recognition control 64 in the floating window 61. When the user clicks the person recognition control 63, the terminal displays the first record 91 "person in the image recognition result" in the floating window 61 A ", jump control 92 corresponding to the first record 91" Person A ", jump control 94 corresponding to the second record 93" Person B "and second record 93" Person B "; when the user clicks on the first When the jump control 92 corresponding to the record 91 "Person A", the floating window 61 displays the related information 95 corresponding to the first record 91 "Person A" and the favorite control 96.
结合参考图10,其示出了本申请一个实施例提供的显示相关信息的界面示意图。终端在悬浮窗61中显示目标图像62,当用户点击某一位置时,终端将以该位置为中心,面积为预设面积的区域1001内的对象确定为待识别对象,之后终端获取该对象的图像识别结果,图像识别结果中包括一条记录“正品棒球帽潮牌帽遮阳帽男女”,则终端直接在悬浮窗61中显示图像识别结果的多项相关信息1002,以及每项相关信息对应的收藏控件1003。With reference to FIG. 10, a schematic diagram of an interface for displaying related information provided by an embodiment of the present application is shown. The terminal displays a target image 62 in the floating window 61. When the user clicks a position, the terminal determines an object in the area 1001 centered on the position and having an area of a preset area as the object to be identified, and then the terminal obtains the object's Image recognition results. The image recognition results include a record "authentic baseball caps, tide brand hats, sun hats, men and women." The terminal directly displays a plurality of related information 1002 of the image recognition results in the floating window 61, and the corresponding collection of each related information Control 1003.
终端在获取图像识别结果时,可能存在未获取到该图像识别结果的情况。在基于图4所示实施例提供的一个可选实施例中,若终端未获取到图像识别结果,则终端显示第一提示信息,该第一提示信息用于提示未能获取到相关信息。When the terminal obtains the image recognition result, there may be a case where the image recognition result is not obtained. In an optional embodiment provided based on the embodiment shown in FIG. 4, if the terminal does not obtain an image recognition result, the terminal displays first prompt information, and the first prompt information is used to prompt that relevant information cannot be obtained.
结合参考图11,其示出了本申请一个实施例提供的第一提示信息的界面示意图。终端在未能获取图像识别结果时,在悬浮窗61中显示目标图像62和第一提示信息1101“未找到相关信息”。With reference to FIG. 11, a schematic diagram of an interface of the first prompt information provided by an embodiment of the present application is shown. When the terminal fails to obtain the image recognition result, the terminal displays the target image 62 and the first prompt information 1101 “No relevant information found” in the floating window 61.
另外,若终端是因为未建立网络连接的原因,而无法获取该图像识别结果的相关信息时,此时终端显示第二提示信息,该第二提示信息用于提示用户建立网络连接,以便终端能重新获取相关信息。可选地,终端还显示有网络设置控件,当终端接收到对应于该网络设置控件的触发信号时,跳转至网络设置界面,以便用户完成网络设置。In addition, if the terminal cannot obtain the information related to the image recognition result because the network connection is not established, the terminal displays a second prompt message at this time, and the second prompt information is used to prompt the user to establish a network connection, so that the terminal can Retrieve relevant information. Optionally, the terminal also displays a network setting control. When the terminal receives a trigger signal corresponding to the network setting control, it jumps to the network setting interface so that the user can complete the network setting.
结合参考图12,其示出了本申请一个实施例提供的第二提示信息的界面示意图。终端因为未建立网络连接而未能获取图像识别结果时,在悬浮窗61中显示目标图像62和第一提示信息1201“请连接网络后重试”、网络设置控件1202。With reference to FIG. 12, a schematic diagram of an interface of the second prompt information provided by one embodiment of the present application is shown. When the terminal fails to obtain the image recognition result because the network connection is not established, the target image 62 and the first prompt message 1201 “Please try again after connecting to the network” and the network setting control 1202 are displayed in the floating window 61.
下述为本申请装置实施例,可以用于执行本申请方法实施例。对于本申请装置实施例中未披露的细节,请参照本申请方法实施例。The following are device embodiments of the present application and can be used to implement the method embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.
请参考图13,其示出了本申请一个实施例提供的视频识图装置的框图。该装置具有实现上述方法示例的功能,所述功能可以由硬件实现,也可以由硬件执行相应的软件实现。该装置可以包括:Please refer to FIG. 13, which is a block diagram of a video image recognition apparatus provided by an embodiment of the present application. The device has a function for implementing the above method example, and the function may be implemented by hardware, or may be implemented by hardware executing corresponding software. The device may include:
控件显示模块1301,用于在处于视频播放场景时,在侧边栏显示识图功能控件。The control display module 1301 is configured to display a picture recognition function control in a sidebar when the video is playing.
图像获取模块1302,用于在接收到对应于所述识图功能控件的第一触发信号时,对当前播放画面进行截图处理,得到目标图像。The image acquisition module 1302 is configured to perform a screenshot process on a current playback screen when a first trigger signal corresponding to the image recognition function control is received, to obtain a target image.
图像识别模块1303,用于获取所述目标图像的图像识别结果,所述图像识别结果是对所述目标图像进行图像识别得到的。An image recognition module 1303 is configured to obtain an image recognition result of the target image, where the image recognition result is obtained by performing image recognition on the target image.
结果显示模块1304,用于显示所述图像识别结果。A result display module 1304 is configured to display the image recognition result.
综上所述,本申请实施例提供的技术方案,通过在视频播放场景下显示视频识图控件,若用户期望了解当前播放画面中的某一人物或物品,直接点击该视频识别控件,之后终端对当前播放画面进行图像识别,并将图像识别结果展示给用户,该过程可以避免用户在两个应用程序之间来回切换,节省用户了解当前播放画面中的某一人物或物品所需的操作,操作更加便捷且识图效率更高。In summary, the technical solution provided in the embodiments of the present application, by displaying a video recognition control in a video playback scene, if the user desires to know a certain character or item in the current playback screen, directly click the video recognition control, and then the terminal Perform image recognition on the current playback screen and display the image recognition results to the user. This process can prevent users from switching back and forth between the two applications, saving the user the operation required to understand a character or item in the current playback screen. The operation is more convenient and the recognition efficiency is higher.
在基于图13所示实施例提供的一个可选实施例中,所述目标图像包含多个待识别对象,所述图像识别模块1303,用于:In an optional embodiment provided based on the embodiment shown in FIG. 13, the target image includes multiple objects to be identified, and the image recognition module 1303 is configured to:
确定所述目标图像中包含的目标待识别对象;Determining a target to-be-recognized object included in the target image;
对所述目标待识别对象进行图像识别,得到所述图像识别结果。Performing image recognition on the target to-be-recognized object to obtain the image recognition result.
可选地,所述图像识别模块1303,用于:Optionally, the image recognition module 1303 is configured to:
显示人物识别控件和/或物品识别控件;Display people recognition controls and / or item recognition controls;
当接收到对应于所述人物识别控件的第二触发信号时,确定所述目标待识别对象为所述目标图像中包含人物图像的区域;When a second trigger signal corresponding to the person recognition control is received, determining that the target object to be identified is a region in the target image that includes a person image;
当接收到对应于所述物品识别控件的第三触发信号时,确定所述目标待识别对象为所述目标图像中包含物品图像的区域。When a third trigger signal corresponding to the item identification control is received, it is determined that the target object to be identified is an area in the target image that includes an item image.
可选地,所述图像识别模块1303,用于:Optionally, the image recognition module 1303 is configured to:
显示所述目标图像,所述目标图像中的各个待识别对象被标注了不同序号;Displaying the target image, and each object to be identified in the target image is labeled with a different serial number;
接收对应于目标序号的选择信号;Receiving a selection signal corresponding to the target sequence number;
将所述目标序号对应的待识别对象确定为所述目标待识别对象。The object to be identified corresponding to the target sequence number is determined as the object to be identified.
可选地,所述图像识别模块1303,用于:Optionally, the image recognition module 1303 is configured to:
显示所述目标图像;Displaying the target image;
接收作用在所述目标图像上的第三触发信号;Receiving a third trigger signal acting on the target image;
将所述第三触发信号对应的目标区域内的待识别对象确定为所述目标待识别对象。An object to be identified in a target area corresponding to the third trigger signal is determined as the target object to be identified.
可选地,所述图像识别模块1303,用于通过机器学习模型对所述目标待识别对象进行图像识别,得到所述图像识别结果,所述机器学习模型是采用多组训练样本数据对神经网络训练得到的,所述多组训练样本数据中的每组训练样本数据包括样本图像,以及所述样本图像对应的识别结果。Optionally, the image recognition module 1303 is configured to perform image recognition on the target to-be-recognized object through a machine learning model to obtain the image recognition result. The machine learning model is to use multiple sets of training sample data to the neural network. According to training, each set of training sample data in the plurality of sets of training sample data includes a sample image and a recognition result corresponding to the sample image.
在基于图13所示实施例提供的另一个可选实施例中,所述装置还包括:信息获取模块和信息显示模块(图中未示出)In another optional embodiment provided based on the embodiment shown in FIG. 13, the device further includes: an information acquisition module and an information display module (not shown in the figure)
信息获取模块,用于获取所述图像识别结果对应的相关信息。An information acquisition module is configured to acquire related information corresponding to the image recognition result.
信息显示模块,用于显示所述图像识别结果对应的相关信息。An information display module is configured to display related information corresponding to the image recognition result.
可选地,Optionally,
当所述图像识别结果为人物标识时,所述图像识别结果对应的相关信息包括以下一项或多项的组合:所述人物标识对应的人物的百科信息、社交帐号信息、新闻资讯信息、作品信息;When the image recognition result is a person identification, the related information corresponding to the image recognition result includes one or more of the following combinations: encyclopedia information, social account information, news information information, works of the person corresponding to the person identification information;
当所述图像识别结果为物品标识时,所述图像识别结果对应的相关信息包括以下一项或多项的组合:所述物品标识对应的物品的百科信息、购买信息。When the image recognition result is an item identification, the related information corresponding to the image recognition result includes one or more of the following combinations: encyclopedia information and purchase information of the item corresponding to the item identification.
可选地,所述信息显示模块,用于:Optionally, the information display module is configured to:
当所述图像识别结果包括多条记录时,接收对应于目标记录的选择信号;When the image recognition result includes multiple records, receiving a selection signal corresponding to the target record;
显示所述目标记录对应的相关信息。Display related information corresponding to the target record.
在基于图13所示实施例提供的另一个可选实施例中,所述控件显示模块1301,用于:In another optional embodiment provided based on the embodiment shown in FIG. 13, the control display module 1301 is configured to:
在处于所述视频播放场景时,接收对应于所述侧边栏的呼出指令;Receiving a call-out instruction corresponding to the sidebar while in the video playback scene;
根据所述呼出指令显示所述侧边栏;其中,所述侧边栏中包括所述识图功能控件。Displaying the sidebar according to the call-out instruction; wherein the sidebar includes the image recognition function control.
需要说明的是,上述实施例提供的装置在实现其功能时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的装置与方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。It should be noted that when the device provided by the foregoing embodiment implements its functions, only the above-mentioned division of functional modules is used as an example. In practical applications, the above functions may be allocated by different functional modules according to needs, that is, the device. The internal structure is divided into different functional modules to complete all or part of the functions described above. In addition, the devices and method embodiments provided by the foregoing embodiments belong to the same concept. For specific implementation processes, refer to the method embodiments, and details are not described herein again.
在示例性实施例中,还提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序由终端的处理器加载并执行以实现上述方法实施例中的各个步骤。In an exemplary embodiment, a computer-readable storage medium is also provided. The computer-readable storage medium stores a computer program, and the computer program is loaded and executed by a processor of a terminal to implement the foregoing method embodiments. Steps.
在示例性实施例中,还提供了一种计算机程序产品,当该计算机程序产品被执行时,其用于实现上述方法实施例中的各个步骤的功能。In an exemplary embodiment, a computer program product is also provided, and when the computer program product is executed, it is used to implement the functions of each step in the foregoing method embodiments.
应当理解的是,在本文中提及的“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。It should be understood that "a plurality" mentioned herein means two or more. "And / or" describes the association relationship of the associated objects, and indicates that there can be three kinds of relationships. For example, A and / or B can mean that there are three cases in which A exists alone, A and B exist, and B exists alone. The character "/" generally indicates that the related objects are an "or" relationship.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。The above-mentioned serial numbers of the embodiments of the present application are merely for description, and do not represent the superiority or inferiority of the embodiments.
以上所述仅为本申请的示例性实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。The above are only exemplary embodiments of the present application and are not intended to limit the present application. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present application shall be included in the protection of the present application. Within range.

Claims (22)

  1. 一种视频识图方法,其特征在于,所述方法包括:A video image recognition method, wherein the method includes:
    在处于视频播放场景时,在侧边栏显示识图功能控件;When in a video playback scene, the picture recognition control is displayed in the sidebar;
    在接收到对应于所述识图功能控件的第一触发信号时,对当前播放画面进行截图处理,得到目标图像;When receiving a first trigger signal corresponding to the image recognition function control, performing screenshot processing on a current playback screen to obtain a target image;
    获取所述目标图像的图像识别结果,所述图像识别结果是对所述目标图像进行图像识别得到的;Acquiring an image recognition result of the target image, where the image recognition result is obtained by performing image recognition on the target image;
    显示所述图像识别结果。The image recognition result is displayed.
  2. 根据权利要求1所述的方法,其特征在于,所述目标图像包含多个待识别对象,所述获取所述目标图像的图像识别结果,包括:The method according to claim 1, wherein the target image includes a plurality of objects to be identified, and the obtaining an image recognition result of the target image comprises:
    确定所述目标图像中包含的目标待识别对象;Determining a target to-be-recognized object included in the target image;
    对所述目标待识别对象进行图像识别,得到所述图像识别结果。Performing image recognition on the target to-be-recognized object to obtain the image recognition result.
  3. 根据权利要求2所述的方法,其特征在于,所述确定所述目标图像中包含的目标待识别对象,包括:The method according to claim 2, wherein the determining the target object to be identified contained in the target image comprises:
    显示人物识别控件和/或物品识别控件;Display people recognition controls and / or item recognition controls;
    当接收到对应于所述人物识别控件的第二触发信号时,确定所述目标待识别对象为所述目标图像中包含人物图像的区域;When a second trigger signal corresponding to the person recognition control is received, determining that the target object to be identified is a region in the target image that includes a person image;
    当接收到对应于所述物品识别控件的第三触发信号时,确定所述目标待识别对象为所述目标图像中包含物品图像的区域。When a third trigger signal corresponding to the item identification control is received, it is determined that the target object to be identified is an area in the target image that includes an item image.
  4. 根据权利要求2所述的方法,其特征在于,所述确定所述目标图像中包含的目标待识别对象,包括:The method according to claim 2, wherein the determining the target object to be identified contained in the target image comprises:
    显示所述目标图像,所述目标图像中的各个待识别对象标注有不同序号;Displaying the target image, and each object to be identified in the target image is marked with a different serial number;
    接收对应于目标序号的选择信号;Receiving a selection signal corresponding to the target sequence number;
    将所述目标序号对应的待识别对象确定为所述目标待识别对象。The object to be identified corresponding to the target sequence number is determined as the object to be identified.
  5. 根据权利要求2所述的方法,其特征在于,所述确定所述目标图像中包含的目标待识别对象,包括:The method according to claim 2, wherein the determining the target object to be identified contained in the target image comprises:
    显示所述目标图像;Displaying the target image;
    接收作用在所述目标图像上的第三触发信号;Receiving a third trigger signal acting on the target image;
    将所述第三触发信号对应的目标区域内的待识别对象确定为所述目标待识别对象。An object to be identified in a target area corresponding to the third trigger signal is determined as the target object to be identified.
  6. 根据权利要求2所述的方法,其特征在于,所述对所述目标待识别对象进行图像识别,得到所述图像识别结果,包括:The method according to claim 2, wherein the performing image recognition on the target to-be-recognized object to obtain the image recognition result comprises:
    通过机器学习模型对所述目标待识别对象进行图像识别,得到所述图像识别结果,所述机器学习模型是采用多组训练样本数据对神经网络训练得到的,所述多组训练样本数据中的每组训练样本数据包括样本图像,以及所述样本图像对应的识别结果。Image recognition is performed on the target to-be-recognized object through a machine learning model to obtain the image recognition result. The machine learning model is obtained by training a neural network using multiple sets of training sample data. Each set of training sample data includes a sample image and a recognition result corresponding to the sample image.
  7. 根据权利要求1至6任一项所述的方法,其特征在于,所述获取所述目标图像的图像识别结果之后,还包括:The method according to any one of claims 1 to 6, wherein after obtaining the image recognition result of the target image, further comprising:
    获取所述图像识别结果对应的相关信息;Acquiring related information corresponding to the image recognition result;
    显示所述图像识别结果对应的相关信息。Display related information corresponding to the image recognition result.
  8. 根据权利要求7所述的方法,其特征在于,The method according to claim 7, wherein:
    当所述图像识别结果为人物标识时,所述图像识别结果对应的相关信息包括以下一项或多项的组合:所述人物标识对应的人物的百科信息、社交帐号信息、新闻资讯信息、作品信息;When the image recognition result is a person identification, the related information corresponding to the image recognition result includes one or more of the following combinations: encyclopedia information, social account information, news information information, works of the person corresponding to the person identification information;
    当所述图像识别结果为物品标识时,所述图像识别结果对应的相关信息包括以下一项或多项的组合:所述物品标识对应的物品的百科信息、购买信息。When the image recognition result is an item identification, the related information corresponding to the image recognition result includes one or more of the following combinations: encyclopedia information and purchase information of the item corresponding to the item identification.
  9. 根据权利要求7所述的方法,其特征在于,所述显示所述图像识别结果对应的相关信息,包括:The method according to claim 7, wherein the displaying related information corresponding to the image recognition result comprises:
    当所述图像识别结果包括多条记录时,接收对应于目标记录的选择信号;When the image recognition result includes multiple records, receiving a selection signal corresponding to the target record;
    显示所述目标记录对应的相关信息。Display related information corresponding to the target record.
  10. 根据权利要求1至6任一项所述的方法,其特征在于,所述在处于视频播放场景时,在侧边栏显示识图功能控件,包括:The method according to any one of claims 1 to 6, wherein the displaying a picture recognition control in a sidebar when the video is playing scenes comprises:
    在处于所述视频播放场景时,接收对应于所述侧边栏的呼出指令;Receiving a call-out instruction corresponding to the sidebar while in the video playback scene;
    根据所述呼出指令显示所述侧边栏;其中,所述侧边栏中包括所述识图功能控件。Displaying the sidebar according to the call-out instruction; wherein the sidebar includes the image recognition function control.
  11. 一种视频识图装置,其特征在于,所述装置包括:A video image recognition device, characterized in that the device includes:
    控件显示模块,用于在处于视频播放场景时,在侧边栏显示识图功能控件;A control display module is used to display the image recognition control in the sidebar when the video is playing.
    图像获取模块,用于在接收到对应于所述识图功能控件的第一触发信号时,对当前播放画面进行截图处理,得到目标图像;An image acquisition module, configured to, when receiving a first trigger signal corresponding to the image recognition function control, perform screenshot processing on a current playback screen to obtain a target image;
    图像识别模块,用于获取所述目标图像的图像识别结果,所述图像识别结果是对所述目标图像进行图像识别得到的;An image recognition module, configured to obtain an image recognition result of the target image, where the image recognition result is obtained by performing image recognition on the target image;
    结果显示模块,用于显示所述图像识别结果。A result display module, configured to display the image recognition result.
  12. 根据权利要求11所述的装置,其特征在于,所述目标图像包含多个待识别对象,所述图像识别模块,用于:The device according to claim 11, wherein the target image includes a plurality of objects to be identified, and the image recognition module is configured to:
    确定所述目标图像中包含的目标待识别对象;Determining a target to-be-recognized object included in the target image;
    对所述目标待识别对象进行图像识别,得到所述图像识别结果。Performing image recognition on the target to-be-recognized object to obtain the image recognition result.
  13. 根据权利要求12所述的装置,其特征在于,所述图像识别模块,用于:The device according to claim 12, wherein the image recognition module is configured to:
    显示人物识别控件和/或物品识别控件;Display people recognition controls and / or item recognition controls;
    当接收到对应于所述人物识别控件的第二触发信号时,确定所述目标待识别对象为所述目标图像中包含人物图像的区域;When a second trigger signal corresponding to the person recognition control is received, determining that the target object to be identified is a region in the target image that includes a person image;
    当接收到对应于所述物品识别控件的第三触发信号时,确定所述目标待识别对象为所述目标图像中包含物品图像的区域。When a third trigger signal corresponding to the item identification control is received, it is determined that the target object to be identified is an area in the target image that includes an item image.
  14. 根据权利要求12所述的装置,其特征在于,所述图像识别模块,用于:The device according to claim 12, wherein the image recognition module is configured to:
    显示所述目标图像,所述目标图像中的各个待识别对象标注有不同序号;Displaying the target image, and each object to be identified in the target image is marked with a different serial number;
    接收对应于目标序号的选择信号;Receiving a selection signal corresponding to the target sequence number;
    将所述目标序号对应的待识别对象确定为所述目标待识别对象。The object to be identified corresponding to the target sequence number is determined as the object to be identified.
  15. 根据权利要求12所述的装置,其特征在于,所述图像识别模块,用于:The device according to claim 12, wherein the image recognition module is configured to:
    显示所述目标图像;Displaying the target image;
    接收作用在所述目标图像上的第三触发信号;Receiving a third trigger signal acting on the target image;
    将所述第三触发信号对应的目标区域内的待识别对象确定为所述目标待识别对象。An object to be identified in a target area corresponding to the third trigger signal is determined as the target object to be identified.
  16. 根据权利要求12所述的装置,其特征在于,所述图像识别模块,用于通过机器学习模型对所述目标待识别对象进行图像识别,得到所述图像识别结果,所述机器学习模型是采用多组训练样本数据对神经网络训练得到的,所述多组训练样本数据中的每组训练样本数据包括样本图像,以及所述样本图像对应的识别结果。The device according to claim 12, wherein the image recognition module is configured to perform image recognition on the target to-be-recognized object through a machine learning model to obtain the image recognition result, and the machine learning model adopts A plurality of sets of training sample data are obtained by training the neural network. Each set of training sample data in the plurality of sets of training sample data includes a sample image and a recognition result corresponding to the sample image.
  17. 根据权利要求11至16任一项所述的装置,其特征在于,所述装置还包括:The device according to any one of claims 11 to 16, wherein the device further comprises:
    信息获取模块,用于获取所述图像识别结果对应的相关信息;An information acquisition module, configured to acquire related information corresponding to the image recognition result;
    信息显示模块,用于显示所述图像识别结果对应的相关信息。An information display module is configured to display related information corresponding to the image recognition result.
  18. 根据权利要求17所述的装置,其特征在于,The device according to claim 17, wherein:
    当所述图像识别结果为人物标识时,所述图像识别结果对应的相关信息包括以下一项或多项的组合:所述人物标识对应的人物的百科信息、社交帐号信息、新闻资讯信息、作品信息;When the image recognition result is a person identification, the related information corresponding to the image recognition result includes one or more of the following combinations: encyclopedia information, social account information, news information information, works of the person corresponding to the person identification information;
    当所述图像识别结果为物品标识时,所述图像识别结果对应的相关信息包括以下一项或多项的组合:所述物品标识对应的物品的百科信息、购买信息。When the image recognition result is an item identification, the related information corresponding to the image recognition result includes one or more of the following combinations: encyclopedia information and purchase information of the item corresponding to the item identification.
  19. 根据权利要求17所述的装置,其特征在于,所述信息显示模块,用于:The device according to claim 17, wherein the information display module is configured to:
    当所述图像识别结果包括多条记录时,接收对应于目标记录的选择信号;When the image recognition result includes multiple records, receiving a selection signal corresponding to the target record;
    显示所述目标记录对应的相关信息。Display related information corresponding to the target record.
  20. 根据权利要求11至16任一项所述的装置,其特征在于,所述控件显示模块,用于:The device according to any one of claims 11 to 16, wherein the control display module is configured to:
    在处于所述视频播放场景时,接收对应于所述侧边栏的呼出指令;Receiving a call-out instruction corresponding to the sidebar while in the video playback scene;
    根据所述呼出指令显示所述侧边栏;其中,所述侧边栏中包括所述识图功能控件。Displaying the sidebar according to the call-out instruction; wherein the sidebar includes the image recognition function control.
  21. 一种终端,其特征在于,所述终端包括处理器和存储器,所述存储器存储有计算机程序,所述计算机程序由所述处理器加载并执行以实现如权利要求1至10任一项所述的视频识图方法。A terminal, wherein the terminal includes a processor and a memory, and the memory stores a computer program, and the computer program is loaded and executed by the processor to implement the method according to any one of claims 1 to 10. Video recognition method.
  22. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机程序,所述计算机程序由处理器加载并执行以实现如权利要求1至10任一项所述的视频识图方法。A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, and the computer program is loaded and executed by a processor to implement the video according to any one of claims 1 to 10. Image recognition method.
PCT/CN2019/096578 2018-08-22 2019-07-18 Video image recognition method and apparatus, terminal and storage medium WO2020038167A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810963246.7 2018-08-22
CN201810963246.7A CN109034115B (en) 2018-08-22 2018-08-22 Video image recognizing method, device, terminal and storage medium

Publications (1)

Publication Number Publication Date
WO2020038167A1 true WO2020038167A1 (en) 2020-02-27

Family

ID=64628027

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/096578 WO2020038167A1 (en) 2018-08-22 2019-07-18 Video image recognition method and apparatus, terminal and storage medium

Country Status (2)

Country Link
CN (1) CN109034115B (en)
WO (1) WO2020038167A1 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111444822A (en) * 2020-03-24 2020-07-24 北京奇艺世纪科技有限公司 Object recognition method and apparatus, storage medium, and electronic apparatus
CN111541907A (en) * 2020-04-23 2020-08-14 腾讯科技(深圳)有限公司 Article display method, apparatus, device and storage medium
CN112565863A (en) * 2020-11-26 2021-03-26 深圳Tcl新技术有限公司 Video playing method and device, terminal equipment and computer readable storage medium
CN112584213A (en) * 2020-12-11 2021-03-30 海信视像科技股份有限公司 Display device and display method of image recognition result
CN112801004A (en) * 2021-02-05 2021-05-14 网易(杭州)网络有限公司 Method, device and equipment for screening video clips and storage medium
CN113747182A (en) * 2021-01-18 2021-12-03 北京京东拓先科技有限公司 Article display method, client, live broadcast server and computer storage medium
CN113766297A (en) * 2021-05-27 2021-12-07 腾讯科技(深圳)有限公司 Video processing method, playing terminal and computer readable storage medium
CN113891040A (en) * 2021-09-24 2022-01-04 深圳Tcl新技术有限公司 Video processing method, video processing device, computer equipment and storage medium
CN113938698A (en) * 2021-10-19 2022-01-14 广州方硅信息技术有限公司 Display control method and device for live user data and computer equipment
CN115086774A (en) * 2022-05-31 2022-09-20 北京达佳互联信息技术有限公司 Resource display method and device, electronic equipment and storage medium
CN115086759A (en) * 2022-05-13 2022-09-20 北京达佳互联信息技术有限公司 Video processing method, video processing device, computer equipment and medium
WO2023169049A1 (en) * 2022-03-09 2023-09-14 聚好看科技股份有限公司 Display device and server

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109034115B (en) * 2018-08-22 2021-10-22 Oppo广东移动通信有限公司 Video image recognizing method, device, terminal and storage medium
CN109857309B (en) * 2019-01-21 2022-02-01 武汉卡比特信息有限公司 Screen capture projection method based on iOS mobile terminal
CN110134807B (en) * 2019-05-17 2021-06-04 苏州科达科技股份有限公司 Target retrieval method, device, system and storage medium
CN110442806B (en) * 2019-08-05 2022-04-26 百度在线网络技术(北京)有限公司 Method and apparatus for recognizing image
CN112784137A (en) * 2019-11-04 2021-05-11 海信视像科技股份有限公司 Display device, display method and computing device
CN110909776A (en) * 2019-11-11 2020-03-24 维沃移动通信有限公司 Image identification method and electronic equipment
CN111339395A (en) * 2020-02-11 2020-06-26 山东经贸职业学院 Data information matching method and system for electronic commerce system
CN111652678B (en) * 2020-05-27 2023-11-14 腾讯科技(深圳)有限公司 Method, device, terminal, server and readable storage medium for displaying article information
CN112162672A (en) * 2020-10-19 2021-01-01 腾讯科技(深圳)有限公司 Information flow display processing method and device, electronic equipment and storage medium
CN112996196B (en) * 2021-02-04 2023-02-10 沃特威(广州)电子科技有限公司 Intelligent environment light control method, system, computer equipment and storage medium
CN113282769A (en) * 2021-04-25 2021-08-20 维沃移动通信有限公司 Multimedia file processing method and device and electronic equipment
CN113282768A (en) * 2021-04-25 2021-08-20 维沃移动通信有限公司 Multimedia file processing method and device and electronic equipment
CN113110785B (en) * 2021-05-12 2023-04-18 北京字节跳动网络技术有限公司 Image processing method, image processing device, electronic equipment and computer readable storage medium
CN113761360A (en) * 2021-05-27 2021-12-07 腾讯科技(深圳)有限公司 Video-based article searching method, device, equipment and storage medium
CN115527135A (en) * 2021-06-24 2022-12-27 Oppo广东移动通信有限公司 Content identification method and device and electronic equipment
CN115878838A (en) * 2021-09-27 2023-03-31 北京有竹居网络技术有限公司 Video-based information display method and device, electronic equipment and storage medium
CN114268847A (en) * 2021-12-15 2022-04-01 北京百度网讯科技有限公司 Video playing method and device, electronic equipment and storage medium
CN116431947A (en) * 2022-01-04 2023-07-14 腾讯科技(深圳)有限公司 Multimedia processing method, apparatus, device, medium and computer program product

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106028160A (en) * 2016-06-03 2016-10-12 腾讯科技(深圳)有限公司 Image data processing method and device
CN106202316A (en) * 2016-07-01 2016-12-07 传线网络科技(上海)有限公司 Merchandise news acquisition methods based on video and device
US20160357406A1 (en) * 2015-06-05 2016-12-08 Samsung Electronics Co., Ltd. Operating method for image and electronic device supporting the same
CN107105340A (en) * 2017-03-21 2017-08-29 百度在线网络技术(北京)有限公司 People information methods, devices and systems are shown in video based on artificial intelligence
CN107515868A (en) * 2016-06-15 2017-12-26 北京陌上花科技有限公司 Searching method and device
CN107957891A (en) * 2017-11-22 2018-04-24 暴风集团股份有限公司 A kind of video player method for information display, device, terminal and system
CN108089786A (en) * 2017-12-14 2018-05-29 广东欧珀移动通信有限公司 Method for displaying user interface, device, equipment and storage medium
CN109034115A (en) * 2018-08-22 2018-12-18 Oppo广东移动通信有限公司 Video knows drawing method, device, terminal and storage medium

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8977639B2 (en) * 2009-12-02 2015-03-10 Google Inc. Actionable search results for visual queries
US9852156B2 (en) * 2009-12-03 2017-12-26 Google Inc. Hybrid use of location sensor data and visual query to return local listings for visual query
JP2013200793A (en) * 2012-03-26 2013-10-03 Sony Corp Information processing apparatus, information processing method, and program
CN102682091A (en) * 2012-04-25 2012-09-19 腾讯科技(深圳)有限公司 Cloud-service-based visual search method and cloud-service-based visual search system
US20150089446A1 (en) * 2013-09-24 2015-03-26 Google Inc. Providing control points in images
KR102158691B1 (en) * 2014-01-08 2020-09-22 엘지전자 주식회사 Mobile terminal and method for controlling the same
CN104090762B (en) * 2014-07-10 2017-04-19 福州瑞芯微电子股份有限公司 Screenshot processing device and method
US10664515B2 (en) * 2015-05-29 2020-05-26 Microsoft Technology Licensing, Llc Task-focused search by image
CN106529413A (en) * 2016-10-13 2017-03-22 北京小米移动软件有限公司 Information acquisition method and device
CN107256109B (en) * 2017-05-27 2021-03-16 北京小米移动软件有限公司 Information display method and device and terminal

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160357406A1 (en) * 2015-06-05 2016-12-08 Samsung Electronics Co., Ltd. Operating method for image and electronic device supporting the same
CN106028160A (en) * 2016-06-03 2016-10-12 腾讯科技(深圳)有限公司 Image data processing method and device
CN107515868A (en) * 2016-06-15 2017-12-26 北京陌上花科技有限公司 Searching method and device
CN106202316A (en) * 2016-07-01 2016-12-07 传线网络科技(上海)有限公司 Merchandise news acquisition methods based on video and device
CN107105340A (en) * 2017-03-21 2017-08-29 百度在线网络技术(北京)有限公司 People information methods, devices and systems are shown in video based on artificial intelligence
CN107957891A (en) * 2017-11-22 2018-04-24 暴风集团股份有限公司 A kind of video player method for information display, device, terminal and system
CN108089786A (en) * 2017-12-14 2018-05-29 广东欧珀移动通信有限公司 Method for displaying user interface, device, equipment and storage medium
CN109034115A (en) * 2018-08-22 2018-12-18 Oppo广东移动通信有限公司 Video knows drawing method, device, terminal and storage medium

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111444822A (en) * 2020-03-24 2020-07-24 北京奇艺世纪科技有限公司 Object recognition method and apparatus, storage medium, and electronic apparatus
CN111444822B (en) * 2020-03-24 2024-02-06 北京奇艺世纪科技有限公司 Object recognition method and device, storage medium and electronic device
CN111541907B (en) * 2020-04-23 2023-09-22 腾讯科技(深圳)有限公司 Article display method, apparatus, device and storage medium
CN111541907A (en) * 2020-04-23 2020-08-14 腾讯科技(深圳)有限公司 Article display method, apparatus, device and storage medium
CN112565863A (en) * 2020-11-26 2021-03-26 深圳Tcl新技术有限公司 Video playing method and device, terminal equipment and computer readable storage medium
CN112584213A (en) * 2020-12-11 2021-03-30 海信视像科技股份有限公司 Display device and display method of image recognition result
CN113747182A (en) * 2021-01-18 2021-12-03 北京京东拓先科技有限公司 Article display method, client, live broadcast server and computer storage medium
CN112801004A (en) * 2021-02-05 2021-05-14 网易(杭州)网络有限公司 Method, device and equipment for screening video clips and storage medium
CN113766297B (en) * 2021-05-27 2023-12-05 腾讯科技(深圳)有限公司 Video processing method, playing terminal and computer readable storage medium
CN113766297A (en) * 2021-05-27 2021-12-07 腾讯科技(深圳)有限公司 Video processing method, playing terminal and computer readable storage medium
CN113891040A (en) * 2021-09-24 2022-01-04 深圳Tcl新技术有限公司 Video processing method, video processing device, computer equipment and storage medium
CN113938698A (en) * 2021-10-19 2022-01-14 广州方硅信息技术有限公司 Display control method and device for live user data and computer equipment
CN113938698B (en) * 2021-10-19 2024-03-12 广州方硅信息技术有限公司 Display control method and device for live user data and computer equipment
WO2023169049A1 (en) * 2022-03-09 2023-09-14 聚好看科技股份有限公司 Display device and server
CN115086759A (en) * 2022-05-13 2022-09-20 北京达佳互联信息技术有限公司 Video processing method, video processing device, computer equipment and medium
CN115086774A (en) * 2022-05-31 2022-09-20 北京达佳互联信息技术有限公司 Resource display method and device, electronic equipment and storage medium
CN115086774B (en) * 2022-05-31 2024-03-05 北京达佳互联信息技术有限公司 Resource display method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN109034115A (en) 2018-12-18
CN109034115B (en) 2021-10-22

Similar Documents

Publication Publication Date Title
WO2020038167A1 (en) Video image recognition method and apparatus, terminal and storage medium
US11320960B2 (en) Icon display method, device, and terminal
US11467715B2 (en) User interface display method, terminal and non-transitory computer-readable storage medium for splitting a display using a multi-finger swipe
US11412153B2 (en) Model-based method for capturing images, terminal, and storage medium
US11644943B2 (en) Method for icon display, terminal, and storage medium
WO2020038168A1 (en) Content sharing method and device, terminal, and storage medium
WO2019174477A1 (en) User interface display method and device, and terminal
US11500513B2 (en) Method for icon display, terminal, and storage medium
WO2020156199A1 (en) Application login method and device, terminal and storage medium
WO2019091411A1 (en) Image capturing method, device, terminal, and storage medium
CN109992315B (en) Touch screen control method and device, terminal and storage medium
WO2019233307A1 (en) User interface display method and apparatus, and terminal and storage medium
WO2019047738A1 (en) Message display method, device, mobile terminal and storage medium
CN111767554B (en) Screen sharing method and device, storage medium and electronic equipment
CN112035046B (en) Method and device for displaying list information, electronic equipment and storage medium
WO2022001452A1 (en) Information display method and apparatus, wearable device, and storage medium
CN113268212A (en) Screen projection method and device, storage medium and electronic equipment
WO2019047183A1 (en) Key display method, apparatus, and terminal
US20200244869A1 (en) Method for capturing images, terminal, and storage medium
CN110971974B (en) Configuration parameter creating method, device, terminal and storage medium
CN109683760B (en) Recent content display method, device, terminal and storage medium
CN111127469A (en) Thumbnail display method, device, storage medium and terminal
CN111526290B (en) Image processing method, device, terminal and storage medium
US11194598B2 (en) Information display method, terminal and storage medium
CN108845734A (en) icon display method, device and terminal

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19852274

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19852274

Country of ref document: EP

Kind code of ref document: A1