WO2020038167A1

WO2020038167A1 - Video image recognition method and apparatus, terminal and storage medium

Info

Publication number: WO2020038167A1
Application number: PCT/CN2019/096578
Authority: WO
Inventors: 宋方
Original assignee: Oppo广东移动通信有限公司
Priority date: 2018-08-22
Filing date: 2019-07-18
Publication date: 2020-02-27
Also published as: CN109034115A; CN109034115B

Abstract

Provided in the embodiments of the present application are a video image recognition method and apparatus, a terminal and a storage medium. The method comprises: in a video playing scene, displaying an image recognition function control in a sidebar; when a first trigger signal corresponding to the image recognition function control is received, carrying out screenshot processing on the currently played picture to obtain a target image; acquiring an image recognition result of the target image; and displaying the image recognition result. In the embodiments of the present application, a video image recognition control is displayed in the video playing scene, if a user wants to know about a person or item in the currently played picture, the user directly clicks on the video recognition control, and then, the terminal carries out image recognition on the currently played picture and displays the image recognition result to the user. By means of this process, the user does not need to switch back and forth between two application programs, the user operation required for knowing about a person or item in the currently played picture is simplified, the operation is more convenient, and the image recognition efficiency is higher.

Description

Video image recognition method, device, terminal and storage medium

This application claims priority from a Chinese patent application filed on August 22, 2018 with an application number of 201810963246.7 and an invention name of "Video Mapping Method, Device, Terminal, and Storage Medium", the entire contents of which are incorporated herein by reference. in.

Technical field

The embodiments of the present application relate to the technical field of terminals, and in particular, to a video image recognition method, device, terminal, and storage medium.

Background technique

When a user watches a video, there is a need to understand the characters or objects in the video. For example, when a user watches a movie through a playback application in the terminal, he wants to know the relevant information about the person who plays the movie.

In related technology, if the user wants to know the characters or objects in the video, the terminal is usually triggered to take a screenshot of the current playback interface and save the screenshot, and then the terminal is triggered to exit the playback application and start running the search application. The user takes the above screenshot Upload to the search application and click the search control. At this time, the terminal obtains the relevant information about the person or item from the network and displays it to the user.

Summary of the Invention

The embodiments of the present application provide a video image recognition method, device, terminal, and storage medium. The technical solution is as follows:

In one aspect, an embodiment of the present application provides a video image recognition method, where the method includes:

When in a video playback scene, the picture recognition control is displayed in the sidebar;

When receiving a first trigger signal corresponding to the image recognition function control, performing screenshot processing on a current playback screen to obtain a target image;

Performing image recognition on the target image to obtain an image recognition result of the target image;

The image recognition result is displayed.

In another aspect, an embodiment of the present application provides a video image recognition apparatus, where the apparatus includes:

A control display module is used to display the image recognition control in the sidebar when the video is playing.

An image acquisition module, configured to, when receiving a first trigger signal corresponding to the image recognition function control, perform screenshot processing on a current playback screen to obtain a target image;

An image recognition module, configured to perform image recognition on the target image to obtain an image recognition result of the target image;

A result display module, configured to display the image recognition result.

In another aspect, an embodiment of the present application provides a terminal. The terminal includes a processor and a memory. The memory stores a computer program, and the computer program is loaded and executed by the processor to implement the method described in the foregoing aspect. Video recognition method.

In yet another aspect, an embodiment of the present application provides a computer-readable storage medium. The computer-readable storage medium stores a computer program, and the computer program is loaded by a processor and executes the video image recognition according to the foregoing aspect. method.

The technical solutions provided in the embodiments of the present application can bring the following beneficial effects:

By displaying the video recognition control in the video playback scene, if the user desires to know a certain person or item in the current playback screen, directly click the video recognition control, and then the terminal performs image recognition on the current playback screen and displays the image recognition result. To the user, this process can prevent the user from switching back and forth between the two applications, saving the user the operation required to understand a certain character or item in the currently playing screen, and the operation is more convenient and the picture recognition efficiency is higher.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic structural diagram of a terminal provided by an exemplary embodiment of the present application; FIG.

2 is a schematic structural diagram of a terminal according to another exemplary embodiment of the present application;

3A to 3F are schematic diagrams of appearances of terminals with different touch display screens provided by exemplary embodiments of the present application;

4 is a flowchart of a video image recognition method according to an embodiment of the present application;

5 is a schematic diagram of an interface for displaying a video image recognition control provided by an embodiment of the present application;

FIG. 6 is a schematic diagram of an interface for determining a target to-be-identified object according to an embodiment of the present application; FIG.

7 is a schematic diagram of an interface for determining a target to-be-recognized object according to another embodiment of the present application;

8 is a schematic diagram of an interface for determining a target to-be-recognized object according to another embodiment of the present application;

9 is a schematic interface diagram of a video image recognition method provided by an embodiment of the present application;

10 is a schematic interface diagram of a video image recognition method provided by an embodiment of the present application;

FIG. 11 is a schematic diagram of an interface where video image recognition fails according to an embodiment of the present application; FIG.

FIG. 12 is a schematic diagram of an interface where video image recognition fails according to an embodiment of the present application; FIG.

FIG. 13 is a block diagram of a video image recognition apparatus according to an embodiment of the present application.

detailed description

To make the objectives, technical solutions, and advantages of the present application clearer, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Reference is made to FIG. 1 and FIG. 2, which are structural block diagrams of a terminal 100 according to an exemplary embodiment of the present application. The terminal 100 may be a mobile phone, a tablet computer, a notebook computer, an e-book, or the like. The terminal 100 in the present application may include one or more of the following components: a processor 110, a memory 120, and a touch display screen 130.

The processor 110 may include one or more processing cores. The processor 110 uses various interfaces and lines to connect various parts in the entire terminal 100, and executes the terminal by running or executing instructions, programs, code sets or instruction sets stored in the memory 120, and calling data stored in the memory 120. 100 various functions and processing data. Optionally, the processor 110 may use at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). To implement a hardware form. The processor 110 may integrate one or a combination of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem. Among them, the CPU mainly handles the operating system, user interface, and application programs; the GPU is responsible for rendering and rendering of the content required to be touched by the display screen 130; and the modem is used for processing wireless communication. It can be understood that the modem may not be integrated into the processor 110, and may be implemented by a single chip.

The memory 120 may include Random Access Memory (RAM), and may also include Read-Only Memory. Optionally, the memory 120 includes a non-transitory computer-readable storage medium. The memory 120 may be used to store instructions, programs, codes, code sets, or instruction sets. The memory 120 may include a storage program area and a storage data area, where the storage program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playback function, an image playback function, etc.), Instructions for implementing the following method embodiments, etc .; the storage data area may store data (such as audio data, phone book) and the like created according to the use of the terminal 100.

Taking the operating system as an Android system as an example, the programs and data stored in the memory 120 are shown in FIG. 1. The memory 120 stores a Linux kernel layer 220, a system runtime layer 240, an application framework layer 260, and an application layer 280. . The Linux kernel layer 220 provides low-level drivers for various hardware of the terminal 100, such as a display driver, an audio driver, a camera driver, a Bluetooth driver, a Wi-Fi driver, and power management. The system runtime layer 240 provides main feature support for the Android system through some C / C ++ libraries. For example, the SQLite library provides database support, the OpenGL / ES library provides 3D graphics support, and the Webkit library provides browser kernel support. An Android runtime library 242 (Android Runtime) is also provided in the system runtime layer 240, which mainly provides some core libraries, which can allow developers to write Android applications using the Java language. The application framework layer 260 provides various APIs that may be used when building applications. Developers can also use these APIs to build their own applications, such as activity management, window management, view management, notification management, content providers, Package management, call management, resource management, positioning management. At least one application program is running in the application layer 280, and these application programs may be contact programs, SMS programs, clock programs, camera applications, etc. that are native to the operating system; or applications developed by third-party developers, such as instant Communication programs, photo beautification programs, etc.

Taking the operating system as an IOS system as an example, the programs and data stored in the memory 120 are shown in Figure 2. The IOS system includes: a core operating system layer 320 (Core OS layer), a core service layer 340 (Core services layer), and a media layer 360 (Media layer), 380 (Cocoa Touch Layer). The core operating system layer 320 includes an operating system kernel, drivers, and a low-level program framework. These low-level program frameworks provide functions closer to the hardware for use by the program framework located in the core service layer 340. The core service layer 340 provides system services and / or program frameworks required by applications, such as a Foundation framework, an account framework, an advertising framework, a data storage framework, a network connection framework, a geographic location framework, a motion framework, and so on. The media layer 360 provides audio-visual-related interfaces for applications, such as interfaces related to graphics and images, interfaces related to audio technology, interfaces related to video technology, and wireless playback (AirPlay) interfaces for audio and video transmission technologies. The touchable layer 380 provides various commonly-used interface-related frameworks for application development. The touchable layer 380 is responsible for user touch interaction operations on the terminal 100. For example, a local notification service, a remote push service, an advertising framework, a game tool framework, a message user interface interface (UI) framework, a user interface UIKit framework, a map framework, and so on.

Among the frameworks shown in FIG. 2, frameworks related to most applications include, but are not limited to, a basic framework in a core service layer 340 and a UIKit framework in a touchable layer 380. The basic framework provides many basic object classes and data types, and provides the most basic system services for all applications, regardless of the UI. The classes provided by the UIKit framework are basic UI class libraries for creating touch-based user interfaces. IOS applications can provide UIs based on the UIKit framework, so it provides the application's infrastructure for building user interfaces and drawing. , Handle events with user interaction, respond to gestures, and more.

The touch display screen 130 is used for receiving a user's touch operation on or near any suitable object using a finger, a touch pen, or the like, and displaying a user interface of each application program. The touch display screen 130 is generally disposed on a front panel of the terminal 130. The touch display screen 130 may be designed as a full screen, a curved screen or a special-shaped screen. The touch display screen 130 can also be designed as a combination of a full screen and a curved screen, and a combination of a special-shaped screen and a curved screen, which is not limited in this embodiment. among them:

Full screen

The full screen may refer to a screen design in which the touch screen display 130 occupies the front panel of the terminal 100 with a screen ratio exceeding a threshold (such as 80% or 90% or 95%). One calculation method of the screen ratio is: (the area of the touch display screen 130 / the area of the front panel of the terminal 100) * 100%; another calculation method of the screen ratio is: (the actual display area in the touch display screen 130) Area / the area of the front panel of the terminal 100) * 100%; another way to calculate the screen ratio is: (the diagonal of the touch screen 130 / the diagonal of the front panel of the terminal 100) * 100% . In the schematic example shown in FIG. 3A, almost all areas on the front panel of the terminal 100 are touch display screens 130. On the front panel 40 of the terminal 100, areas other than the edges generated by the middle frame 41 Are all touch display screens 130. The four corners of the touch display screen 130 may be right-angled or rounded.

The full screen may also be a screen design in which at least one front panel component is integrated inside or below the touch display screen 130. Optionally, the at least one front panel component includes a camera, a fingerprint sensor, a proximity light sensor, a distance sensor, and the like. In some embodiments, other components on the front panel of the conventional terminal are integrated in all or part of the touch display screen 130. For example, after the photosensitive element in the camera is divided into multiple photosensitive pixels, each photosensitive The pixels are integrated in a black area in each display pixel in the touch display screen 130. Since at least one front panel component is integrated inside the touch display screen 130, the full screen has a higher screen ratio.

Of course, in other embodiments, the front panel components on the front panel of the traditional terminal can also be set on the side or back of the terminal 100. For example, an ultrasonic fingerprint sensor is set under the touch display screen 130, and a bone conduction type The handset is disposed inside the terminal 130, and the camera is disposed on the side of the terminal and is pluggable.

In some optional embodiments, when the terminal 100 adopts a full screen, a single side of the middle frame of the terminal 100, or two sides (such as left and right sides), or four sides (such as (Upper, lower, left, and right sides) are provided with edge touch sensors 120, which are used to detect the user's touch operations, click operations, press operations, and slide operations on the middle frame. At least one operation. The edge touch sensor 120 may be any one of a touch sensor, a thermal sensor, and a pressure sensor. The user can apply an operation on the edge touch sensor 120 to control an application program in the terminal 100.

Curved screen

The curved screen refers to a screen design in which the cross-section of the touch display screen 130 has a curved shape and the projection is a plane in a direction parallel to the cross-section. The curved shape may be U-shaped. Optionally, the curved screen refers to a screen design manner in which at least one side is a curved shape. Optionally, the curved screen refers to that at least one side of the touch display screen 130 extends to cover the middle frame of the terminal 100. Since the side of the touch display screen 130 extends to the middle frame of the terminal 100, the middle frame without the display function and the touch function is also covered as a displayable area and / or an operable area, so that the curved screen has a more High screen ratio. Optionally, in the example shown in FIG. 3B, the curved screen refers to a screen design in which the left and right sides 42 are curved; or the curved screen refers to a screen design in which the upper and lower sides are curved; or, Curved screen refers to a screen design with four curved sides on the top, bottom, left, and right. In an alternative embodiment, the curved screen is made of a touch screen material with a certain flexibility.

Shaped screen

The special-shaped screen is a touch display screen with an irregular appearance. The irregular shape is not a rectangle or a rounded rectangle. Optionally, the special-shaped screen refers to a screen design provided with protrusions, notches, and / or holes on the rectangular or rounded rectangular touch display screen 130. Optionally, the protrusion, the notch and / or the hole can be located at the edge of the touch display screen 130, the center of the screen, or both. When the protrusion, notch and / or hole is set on one edge, it can be set at the middle position or both ends of the edge; when the protrusion, notch and / or hole is set on the center of the screen, it can be set above the screen One or more of the region, the upper left region, the left region, the lower left region, the lower region, the lower right region, the right region, and the upper right region. When set in multiple areas, the protrusions, notches, and digging holes can be distributed in a centralized or distributed manner; they can be distributed symmetrically or asymmetrically. Optionally, the number of the protrusions, notches and / or holes is not limited.

The special-shaped screen covers the upper and / or lower forehead area of the touch display screen as a displayable area and / or an operable area, so that the touch-screen display takes up more space on the front panel of the terminal. Larger screen ratio. In some embodiments, the notches and / or holes are used to receive at least one front panel component, which includes a camera, a fingerprint sensor, a proximity light sensor, a distance sensor, a handset, an ambient light sensor, and a physical button. At least one.

Exemplarily, the notch may be provided on one or more edges, and the notch may be a semicircular notch, a right-angled rectangular notch, a rounded rectangular notch, or an irregularly shaped notch. In the example shown in FIG. 3C schematically, the special-shaped screen may be a screen design provided with a semi-circular notch 43 at the center of the upper edge of the touch display screen 130, and the space vacated by the semi-circular notch 43 is used. At least one front panel component for accommodating a camera, a distance sensor (also known as a proximity sensor), a handset, and an ambient light brightness sensor; as shown schematically in FIG. 3D, the special-shaped screen may be on the lower edge of the touch display screen 130 The screen design of the semi-circular notch 44 is set at the center position, and the space vacated by the semi-circular notch 44 is used to accommodate at least one component of a physical button, a fingerprint sensor, and a microphone; as shown schematically in FIG. 3E In the example, the special-shaped screen may be a screen design provided with a semi-elliptical notch 45 at the center of the lower edge of the touch display screen 130, and a semi-elliptical notch is formed on the front panel of the terminal 100. The notch encloses an elliptical area, which is used to accommodate physical keys or fingerprint recognition modules. In the example shown schematically in Figure 3F, the special-shaped screen can be touched. The upper half of the display screen 130 is provided with a screen design of at least one small hole 46. The space vacated by the small hole 46 is used to accommodate at least one of the front panel of the camera, distance sensor, handset, and ambient light sensor. component.

In addition, those skilled in the art can understand that the structure of the terminal 100 shown in the above drawings does not constitute a limitation on the terminal 100. The terminal may include more or fewer components than shown, or a combination of some Components, or different component arrangements. For example, the terminal 100 further includes components such as a radio frequency circuit, an input unit, a sensor, an audio circuit, a wireless fidelity (WiFi) module, a power source, and a Bluetooth module, and details are not described herein again.

In the related art, if a user wants to know a character or an article in a video, he needs to switch between two applications. The operation required in this process is very tedious and inefficient.

Based on this, embodiments of the present application provide a method, a device, a terminal, and a storage medium for video image recognition. In the technical solution provided in the embodiment of the present application, the terminal displays a video recognition control in a video playback scene. If the user desires to know a certain character or item in the current playback screen, directly click the video recognition control, and then the terminal displays the current playback The screen performs image recognition and displays the image recognition results to the user. This process can prevent the user from switching back and forth between the two applications, saving the user the operation required to understand a character or item in the currently playing screen, and improving work efficiency. .

In the embodiment of the present application, the execution subject of each step may be a terminal described in the foregoing embodiment. The terminal has a video playing function. Optionally, the terminal also has an image recognition function. In some embodiments of the present application, an application for implementing a video playback function is installed and run in the terminal, and the execution subject of each step may be the application, and the application may be a system application or a third-party application. For ease of description, in the following method embodiments, only the execution subject of each step is a terminal is used as an example for description, but this is not a limitation.

Please refer to FIG. 4, which shows a flowchart of a video image recognition method provided by an embodiment of the present application. The method may include the following steps:

Step 401: When in a video playback scene, display a picture recognition function control in a sidebar.

A video playing scene refers to a scene where a terminal is playing a video. In a possible implementation manner, the terminal plays a video by playing an application program; in another possible implementation manner, the terminal plays a video in a webpage through a browser.

The sidebar is used to display application icons and / or function controls in the terminal, so that the terminal can conveniently open other applications or execute functions corresponding to the function controls while the terminal is running the application in the foreground. The application icons and / or function controls displayed in the sidebar can be set by the terminal by default or can be customized by the user. In the embodiment of the present application, the image recognition function control is included in the sidebar.

The image recognition function control is used to trigger image recognition of the picture in the currently playing video. The identification function control may be displayed when the video starts to play, or may be displayed according to an operation signal triggered by a user. The embodiment of the present application does not limit the display timing of the identification function control.

When the display function control is displayed according to the operation signal triggered by the user, step 401 may include the following two sub-steps:

Step 401a, when in a video playing scene, receive an outgoing call instruction corresponding to a sidebar;

In step 401b, a sidebar is displayed according to the outgoing call instruction.

The call out command is used to call out the sidebar. Optionally, a buoy is displayed on the display interface of the terminal, and if a trigger signal acting on the buoy is received, the terminal receives an outgoing call instruction.

The buoy can always be displayed on the upper layer of the display interface, or can be displayed on the upper layer of the display interface when the application is started and run, and can also be displayed on the upper layer of the display interface according to the operation signal triggered by the user. The embodiment of the present application does not limit the display timing of the buoy. The shape of the buoy may be a circle, an oval, a rectangle, or the like, and the shape of the buoy is not limited in the embodiment of the present application. The area of the buoy can be set by the terminal by default, or can be set by the user, which is not limited in the embodiment of the present application. In addition, in order to reduce the occlusion of the display interface as much as possible, the buoy can be set to a transparency greater than 0.

The trigger signal acting on the buoy may be any one of a click signal, a double-click signal, a long-press signal, a slide signal, and a drag signal, which is not limited in the embodiment of the present application. In the embodiment of the present application, the trigger signal acting on the buoy is a slide signal as an example for description.

In addition, when the terminal is in the horizontal screen display state, the buoy will block the display interface, and the user's immersion is reduced at this time. In order to avoid this situation, in some embodiments of the present application, when the terminal receives a trigger signal on the display side, it receives an outgoing call instruction. Exemplarily, the trigger signal on the display side is a sliding signal from the outside to the inside of the display side.

With reference to FIG. 5, a schematic diagram of an interface for displaying a video image recognition control according to an embodiment of the present application is shown. When in the video playback scene, the user performs a sliding operation from the outside to the inside of the left side of the terminal. After receiving the sliding operation signal, the terminal displays a sidebar 51, and the sidebar 51 includes a picture recognition control 52.

In step 402, when a first trigger signal corresponding to the image recognition function control is received, screenshot processing is performed on the current playback screen to obtain a target image.

The first trigger signal is triggered by the user, and may be any one of a click signal, a double-click signal, a long-press signal, a slide signal, and a drag signal. In the embodiment of the present application, the first trigger signal is a click signal as an example for description. The target image is an image that needs to be identified. Optionally, the terminal determines the playback picture displayed when the first trigger signal is received as the target image.

In the embodiment of the present application, the target image needs to be displayed to the user, so that the user determines whether it is an image that needs to be identified. In the embodiment of the present application, the target image is acquired by a screenshot processing method. Screenshot processing refers to capturing the current playback frame and determining the captured playback frame as the target image.

In a possible implementation manner, the terminal performs screenshot processing on a complete current playback picture to obtain a target image. In another possible implementation manner, the terminal performs a screenshot process on a part of the pictures currently being played to obtain a target image. Some of the above screens can be selected by the user. Optionally, upon receiving the first trigger signal corresponding to the image recognition function control, the video playback is paused and the user is prompted to capture the target image, the user performs a drag operation on the current playback screen, and then the terminal intercepts the diagonal line as the drag A rectangular area of a straight line from the start point to the end point of the operation signal is used as the target image.

After the terminal acquires the target image, it can also display the target image. Optionally, the terminal displays the target image on a floating window. Because the size of the floating window is small, when the target image is displayed on the floating window, the target image needs to be reduced in size.

Step 403: Obtain an image recognition result of the target image.

The image recognition result is obtained by performing image recognition on the target image. Optionally, the image recognition result may include at least one record, and each record is used to represent a recognition result of an element in the target image, which may be a person identification or an item identification. The character identification is used to uniquely identify the person, and it may be the name of the person. The terminal recognizes the person in the current playback screen and obtains the character identification. The item identifier is used to uniquely identify the item, which may be the name of the item. The terminal recognizes the item in the currently playing screen and obtains the item identification. In addition, the image recognition result also includes the similarity corresponding to each record. The similarity refers to the similarity between the record and the corresponding element in the target image, and is used to measure the accuracy of the image recognition result. The higher the similarity, the more accurate the image recognition result; the lower the similarity, the less accurate the image recognition result is.

In a first possible implementation manner, the terminal recognizes a target image to obtain an image recognition result. In a second possible implementation manner, the server performs image recognition on the target image to obtain an image recognition result, and then the terminal obtains the image recognition result from the server. Specifically, the terminal sends an identification request to the server, and the identification request carries the identification of the terminal and the target image, and the server recognizes the target image according to the identification request, obtains an image recognition result, and returns the image recognition result to the terminal. In the embodiment of the present application, only the first possible implementation manner is taken as an example for explanation.

The embodiment of the present application does not limit the algorithm used for image recognition. It may be an image recognition algorithm based on model matching, an image recognition algorithm based on neural networks, an image recognition algorithm based on wavelet moments, an image recognition algorithm based on fractal features, and so on. This is not limited in the embodiments of the present application.

Optionally, after the terminal displays the target image on the floating window, the terminal may further display query information, where the query information is used to query whether it is necessary to obtain an image recognition result of the target image. When receiving a confirmation instruction corresponding to the query information, Perform the step of obtaining an image recognition result of the target image.

Step 404: Display the image recognition result.

After the terminal obtains the image recognition result, the terminal displays the image recognition result for the user to view. Optionally, the image recognition result is also displayed in the floating window mentioned in step 402.

In summary, the technical solution provided in the embodiments of the present application, by displaying a video recognition control in a video playback scene, if the user desires to know a certain character or item in the current playback screen, directly click the video recognition control, and then the terminal Perform image recognition on the current playback screen and display the image recognition results to the user. This process can prevent users from switching back and forth between the two applications, saving the user the operation required to understand a character or item in the current playback screen. The operation is more convenient and the recognition efficiency is higher.

Because an image may include multiple objects, such as people, objects, animals, flowers, trees, etc., if the user only needs to understand some elements, and the terminal still performs image recognition on the entire image, it may result in low recognition efficiency. . In the embodiment of the present application, the user selects an object to be identified among the multiple objects, and then the terminal obtains only the image recognition result of the object to be recognized, without acquiring the image recognition result of the entire image, which can improve the recognition efficiency. In an optional embodiment provided based on the embodiment shown in FIG. 4, the target image includes multiple objects to be identified, and step 403 includes the following two sub-steps:

Step 501: Determine a target to-be-recognized object included in the target image.

The target to-be-recognized object refers to an object that the user desires to recognize, which can be selected by the user. The number of target to-be-recognized objects may be one or multiple. The number of target to-be-recognized objects may be less than the number of objects contained in the target image, or may be equal to the number of objects contained in the target image. The three implementation methods for determining the target object to be identified are explained separately below.

In a first possible implementation manner, step 501 includes the following sub-steps:

Step 501a, displaying a person identification control and / or an item identification control;

The person recognition control is used to trigger the recognition of the area containing the person image in the target image, and the item recognition control is used to trigger the recognition of the area containing the object image in the target image. Optionally, the terminal displays the above-mentioned person recognition control and / or item recognition control while displaying the target image. Optionally, the above-mentioned person recognition control and / or item recognition control are also displayed in the floating window.

In step 501b, when a second trigger signal corresponding to the person recognition control is received, it is determined that the target object to be identified is an area including a person image in the target image;

The area containing the person image in the target image may be a rectangular area containing a face image. Further, the area containing the person image in the target image is the smallest rectangular area containing the face image.

In step 501c, when a third trigger signal corresponding to the item identification control is received, it is determined that the target object to be identified is an area in the target image that includes the item image.

The area containing the image of the article in the target image may be the area containing the entire article or a rectangular area containing the key features of the article. The key characteristics of the item can be determined based on the actual item. For example, when the item is a flower, its key feature is a petal. Further, the area containing the image of the article in the target image may be the smallest rectangular area containing the entire article, or the smallest rectangular area containing key features of the article.

With reference to FIG. 6, a schematic diagram of an interface for determining an object to be identified is provided according to an embodiment of the present application. The terminal displays a target image 62, a person recognition control 63, and an article recognition control 64 on the floating window 61. When the user clicks the person recognition control 63, the terminal determines that the target to-be-recognized object is a region containing the person image in the target image; when the user clicks the item recognition control 64, the terminal determines that the target to-be-recognized object is an area that includes the object image in the target image.

In a second possible implementation manner, step 501 includes the following sub-steps:

Step 501d, displaying a target image;

Each object to be identified in the target image is labeled with a different serial number. Optionally, the terminal also displays the above-mentioned different serial numbers below the target image.

Step 501e: Receive a selection signal corresponding to the target sequence number;

The selection signal corresponding to the target number may be any one of a click signal, a double-click signal, a long-press signal, a slide signal, and a drag signal, which is not limited in this embodiment of the present application. In the embodiment of the present application, the selection signal corresponding to the target sequence number is a click signal as an example for description.

The target sequence number is the selected sequence number. If the user wants to know about an object, he can select the serial number corresponding to the object. If the terminal also displays the different serial numbers above the target image, the terminal may select the target serial number in the target image, or select the target serial number among the serial numbers displayed below the target image.

In step 501f, the object to be identified corresponding to the target number is determined as the target object to be identified.

The terminal determines the object corresponding to the selected serial number as the target object to be identified. Optionally, the floating window further includes a completion control, and when the terminal receives a confirmation instruction corresponding to the completion control, the object corresponding to the selected serial number is determined as the object to be identified.

With reference to FIG. 7, a schematic diagram of an interface for determining an object to be identified is provided according to an embodiment of the present application. The terminal displays a target image 62 and a completion control 71 on the floating window 61. Each to-be-recognized object in the target image 62 is identified with a different serial number. When the user clicks a certain serial number and the completion control 71, the terminal corresponding to the serial number The recognition object is determined as the target to-be-recognized object.

In a third possible implementation manner, step 501 includes the following sub-steps:

Step 501g, displaying a target image;

Step 501h, receiving a third trigger signal acting on the target image;

The third trigger signal may be any one of a click signal, a double-click signal, a long-press signal, a slide signal, and a drag signal, which is not limited in the embodiment of the present application. In step 501i, the object to be identified in the target area corresponding to the third trigger signal is determined as the target object to be identified.

When the third trigger signal is any one of a click signal, a double-click signal, and a long-press signal, the target area corresponding to the third trigger signal refers to the trigger position of the third trigger signal as the center and the area is a preset area Area. The trigger position of the third trigger signal refers to a contact position between the user's finger and the display screen. The preset area can be set according to actual experience, which is not limited in the embodiment of the present application. When the third trigger signal is a sliding signal or a drag signal, the target area corresponding to the third trigger signal is a rectangular area with the motion track of the third trigger signal as a diagonal.

Optionally, the floating window further includes a completion control, and when the terminal receives a confirmation instruction corresponding to the completion control, the object in the target area corresponding to the third trigger signal is determined as the target object to be identified.

With reference to FIG. 8, a schematic diagram of an interface for determining an object to be identified is provided according to another embodiment of the present application. The terminal displays a target image 62 on the floating window 61. When the user clicks on a position, the terminal determines the target object to be identified in the area 81 centered on the position and having an area of a preset area.

Step 302: Perform image recognition on the target to-be-recognized object to obtain an image recognition result.

Optionally, step 302 may be implemented as: performing image recognition on the target object to be recognized through a machine learning model to obtain an image recognition result.

A machine learning model is obtained by training a neural network using multiple sets of training sample data. Each set of training sample data in the plurality of sets of training sample data includes a sample image and a recognition result corresponding to the sample image. The recognition result corresponding to the sample image can be obtained manually, that is, the relevant technician determines the recognition result corresponding to the sample image and records it.

The neural network may be a Convolutional Neural Network (CNN), an Artificial Neural Network (ANN), a Deep Neural Networks (DNN), and the like, which are not limited in the embodiments of the present application.

The machine learning algorithm used in training the machine learning model can be a back-propagation (BP) algorithm, a faster regional convolutional neural network faster RCNN (faster Regions with Convolutional Neural Network, faster RCNN) algorithm, etc., this application The embodiment is not limited thereto.

Optionally, the machine learning model includes: an input layer, at least one hidden layer, and an output layer. The input data of the input layer is the target image or the target object to be identified in the target image, and the output result of the output layer is the image recognition result of the target image. The determination process is as follows: the target image or the object to be identified in the target image is input to the input layer of the machine learning model, and the hidden layer of the machine learning model performs feature extraction on the above feature data, and combines and abstracts the extracted features , And finally the image recognition result of the target image is output by the output layer. In addition, in the embodiment of the present application, the specific structure of the hidden layer is not limited. Generally speaking, the more layers of a neural network, the better the effect but the longer the calculation time. In practical applications, a neural network with an appropriate number of layers can be designed in accordance with the accuracy requirements.

In addition, the training process of the machine learning model is as follows: obtaining the initial machine learning model, inputting the sample images in the sample training data to the initial machine learning model, and outputting the actual recognition results corresponding to the sample images from the initial machine learning model, The recognition result is compared with the image recognition result corresponding to the sample image to obtain the calculated loss, and then the calculated loss is compared with a preset threshold. If the calculated loss is greater than the preset threshold, the parameters of the initial machine learning model are updated. Then, the steps of inputting the sample images in the sample training data to the initial machine learning model are restarted. If the calculation loss is not greater than a preset threshold, a machine learning model is generated. Wherein, the preset threshold may be determined actually according to the recognition accuracy, which is not limited in the embodiment of the present application.

In summary, the technical solution provided in the embodiments of the present application allows the user to first identify the person or article to be identified in the image to be identified, and does not need to perform image recognition on the entire image during subsequent image recognition. The selected people or objects for identification can improve the efficiency of image recognition.

After the image recognition result is obtained, the terminal may also obtain and display related information corresponding to the image recognition result, so that the user can know more abundant and comprehensive information about the person or article in the playback screen. In an optional embodiment provided based on the embodiment shown in FIG. 4, after step 403, the video image recognition method may further include the following steps:

Step 601: Obtain related information corresponding to the image recognition result.

When the image recognition result is a person identification, the related information corresponding to the image recognition result includes one or more of the following combinations: encyclopedia information, social account information, news information information, and work information of the person corresponding to the person identification.

Encyclopedia information refers to the detailed information of the person, which usually includes name, age, occupation, birthday, and so on. The social account information includes a web page link of the social account used by the character. When the web page link is clicked, the terminal displays the main page of the social account so that the user can establish a social relationship with the social account by using the social account. It can be concern relationship, listening relationship, friend relationship, etc. News information refers to news information related to the person. The work information includes a detailed introduction to the work in which the character has appeared, and a link to visit.

When the image recognition result is an item identification, the related information corresponding to the image recognition result includes one or more of the following combinations: encyclopedia information and purchase information of the item corresponding to the item identification.

Encyclopedia information refers to the detailed information of the item, which can include the name, material, weight, etc. of the item. The purchase information includes a purchase link for the item. When the purchase link is clicked, the terminal displays a purchase page for the item so that the user can purchase the item.

In a first possible implementation manner, the terminal acquires the related information of the image recognition result locally. In a second possible implementation manner, the terminal obtains related information of the image recognition result from the server. Specifically, the terminal sends an acquisition request to the server, and the identification request carries the identification of the terminal and the image recognition result. The server obtains related information corresponding to the image recognition result according to the acquisition request, and returns the related information to the terminal. In the embodiment of the present application, only the second possible implementation manner is used as an example for explanation.

Step 602: Display related information corresponding to the image recognition result.

If the image recognition result includes a record, the terminal directly jumps to display related information corresponding to the image recognition result. In other possible implementations, the terminal displays a jump control while displaying the image recognition result, and when the terminal receives a trigger signal corresponding to the jump control, displays related information corresponding to the image recognition result.

If the image recognition result includes multiple records, the terminal displays the jump control corresponding to each record. When the user receives a trigger signal corresponding to the target jump control, the terminal displays related information of the record corresponding to the target jump control.

Optionally, when the terminal displays related information corresponding to the image recognition result, it also displays a favorite control. When the terminal receives the trigger signal corresponding to the favorite control, the terminal saves relevant information corresponding to the image recognition result. In addition, the favorite control will change to the favorited state. In a possible implementation manner, the terminal directly stores the foregoing related information in the first storage path, and subsequent users can directly view the related information when there is no network connection, thereby reducing traffic consumption. In another possible implementation manner, the terminal stores the access address corresponding to the related information in the second storage path, and subsequent users can obtain and view related information again through the access address, thereby reducing the storage space occupation of the terminal. The first storage path and the second storage path may be set by a user, or may be set by a terminal by default, which is not limited in the embodiment of the present application. In addition, when the related information includes multiple items, each item corresponds to a favorite control, so that the user can selectively save the related related information that he needs.

With reference to FIG. 9, it illustrates a schematic diagram of an interface for displaying related information provided by an embodiment of the present application. The terminal displays the target image 62, the person recognition control 63, and the object recognition control 64 in the floating window 61. When the user clicks the person recognition control 63, the terminal displays the first record 91 "person in the image recognition result" in the floating window 61 A ", jump control 92 corresponding to the first record 91" Person A ", jump control 94 corresponding to the second record 93" Person B "and second record 93" Person B "; when the user clicks on the first When the jump control 92 corresponding to the record 91 "Person A", the floating window 61 displays the related information 95 corresponding to the first record 91 "Person A" and the favorite control 96.

With reference to FIG. 10, a schematic diagram of an interface for displaying related information provided by an embodiment of the present application is shown. The terminal displays a target image 62 in the floating window 61. When the user clicks a position, the terminal determines an object in the area 1001 centered on the position and having an area of a preset area as the object to be identified, and then the terminal obtains the object's Image recognition results. The image recognition results include a record "authentic baseball caps, tide brand hats, sun hats, men and women." The terminal directly displays a plurality of related information 1002 of the image recognition results in the floating window 61, and the corresponding collection of each related information Control 1003.

When the terminal obtains the image recognition result, there may be a case where the image recognition result is not obtained. In an optional embodiment provided based on the embodiment shown in FIG. 4, if the terminal does not obtain an image recognition result, the terminal displays first prompt information, and the first prompt information is used to prompt that relevant information cannot be obtained.

With reference to FIG. 11, a schematic diagram of an interface of the first prompt information provided by an embodiment of the present application is shown. When the terminal fails to obtain the image recognition result, the terminal displays the target image 62 and the first prompt information 1101 “No relevant information found” in the floating window 61.

In addition, if the terminal cannot obtain the information related to the image recognition result because the network connection is not established, the terminal displays a second prompt message at this time, and the second prompt information is used to prompt the user to establish a network connection, so that the terminal can Retrieve relevant information. Optionally, the terminal also displays a network setting control. When the terminal receives a trigger signal corresponding to the network setting control, it jumps to the network setting interface so that the user can complete the network setting.

With reference to FIG. 12, a schematic diagram of an interface of the second prompt information provided by one embodiment of the present application is shown. When the terminal fails to obtain the image recognition result because the network connection is not established, the target image 62 and the first prompt message 1201 “Please try again after connecting to the network” and the network setting control 1202 are displayed in the floating window 61.

The following are device embodiments of the present application and can be used to implement the method embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.

Please refer to FIG. 13, which is a block diagram of a video image recognition apparatus provided by an embodiment of the present application. The device has a function for implementing the above method example, and the function may be implemented by hardware, or may be implemented by hardware executing corresponding software. The device may include:

The control display module 1301 is configured to display a picture recognition function control in a sidebar when the video is playing.

The image acquisition module 1302 is configured to perform a screenshot process on a current playback screen when a first trigger signal corresponding to the image recognition function control is received, to obtain a target image.

An image recognition module 1303 is configured to obtain an image recognition result of the target image, where the image recognition result is obtained by performing image recognition on the target image.

A result display module 1304 is configured to display the image recognition result.

In an optional embodiment provided based on the embodiment shown in FIG. 13, the target image includes multiple objects to be identified, and the image recognition module 1303 is configured to:

Determining a target to-be-recognized object included in the target image;

Performing image recognition on the target to-be-recognized object to obtain the image recognition result.

Optionally, the image recognition module 1303 is configured to:

Display people recognition controls and / or item recognition controls;

When a second trigger signal corresponding to the person recognition control is received, determining that the target object to be identified is a region in the target image that includes a person image;

When a third trigger signal corresponding to the item identification control is received, it is determined that the target object to be identified is an area in the target image that includes an item image.

Optionally, the image recognition module 1303 is configured to:

Displaying the target image, and each object to be identified in the target image is labeled with a different serial number;

Receiving a selection signal corresponding to the target sequence number;

The object to be identified corresponding to the target sequence number is determined as the object to be identified.

Optionally, the image recognition module 1303 is configured to:

Displaying the target image;

Receiving a third trigger signal acting on the target image;

An object to be identified in a target area corresponding to the third trigger signal is determined as the target object to be identified.

Optionally, the image recognition module 1303 is configured to perform image recognition on the target to-be-recognized object through a machine learning model to obtain the image recognition result. The machine learning model is to use multiple sets of training sample data to the neural network. According to training, each set of training sample data in the plurality of sets of training sample data includes a sample image and a recognition result corresponding to the sample image.

In another optional embodiment provided based on the embodiment shown in FIG. 13, the device further includes: an information acquisition module and an information display module (not shown in the figure)

An information acquisition module is configured to acquire related information corresponding to the image recognition result.

An information display module is configured to display related information corresponding to the image recognition result.

Optionally,

When the image recognition result is a person identification, the related information corresponding to the image recognition result includes one or more of the following combinations: encyclopedia information, social account information, news information information, works of the person corresponding to the person identification information;

Optionally, the information display module is configured to:

When the image recognition result includes multiple records, receiving a selection signal corresponding to the target record;

Display related information corresponding to the target record.

In another optional embodiment provided based on the embodiment shown in FIG. 13, the control display module 1301 is configured to:

Receiving a call-out instruction corresponding to the sidebar while in the video playback scene;

Displaying the sidebar according to the call-out instruction; wherein the sidebar includes the image recognition function control.

It should be noted that when the device provided by the foregoing embodiment implements its functions, only the above-mentioned division of functional modules is used as an example. In practical applications, the above functions may be allocated by different functional modules according to needs, that is, the device. The internal structure is divided into different functional modules to complete all or part of the functions described above. In addition, the devices and method embodiments provided by the foregoing embodiments belong to the same concept. For specific implementation processes, refer to the method embodiments, and details are not described herein again.

In an exemplary embodiment, a computer-readable storage medium is also provided. The computer-readable storage medium stores a computer program, and the computer program is loaded and executed by a processor of a terminal to implement the foregoing method embodiments. Steps.

In an exemplary embodiment, a computer program product is also provided, and when the computer program product is executed, it is used to implement the functions of each step in the foregoing method embodiments.

It should be understood that "a plurality" mentioned herein means two or more. "And / or" describes the association relationship of the associated objects, and indicates that there can be three kinds of relationships. For example, A and / or B can mean that there are three cases in which A exists alone, A and B exist, and B exists alone. The character "/" generally indicates that the related objects are an "or" relationship.

The above-mentioned serial numbers of the embodiments of the present application are merely for description, and do not represent the superiority or inferiority of the embodiments.

The above are only exemplary embodiments of the present application and are not intended to limit the present application. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present application shall be included in the protection of the present application. Within range.

Claims

A video image recognition method, wherein the method includes:

When in a video playback scene, the picture recognition control is displayed in the sidebar;

When receiving a first trigger signal corresponding to the image recognition function control, performing screenshot processing on a current playback screen to obtain a target image;

Acquiring an image recognition result of the target image, where the image recognition result is obtained by performing image recognition on the target image;

The image recognition result is displayed.
The method according to claim 1, wherein the target image includes a plurality of objects to be identified, and the obtaining an image recognition result of the target image comprises:

Determining a target to-be-recognized object included in the target image;

Performing image recognition on the target to-be-recognized object to obtain the image recognition result.
The method according to claim 2, wherein the determining the target object to be identified contained in the target image comprises:

Display people recognition controls and / or item recognition controls;

When a second trigger signal corresponding to the person recognition control is received, determining that the target object to be identified is a region in the target image that includes a person image;

When a third trigger signal corresponding to the item identification control is received, it is determined that the target object to be identified is an area in the target image that includes an item image.
The method according to claim 2, wherein the determining the target object to be identified contained in the target image comprises:

Displaying the target image, and each object to be identified in the target image is marked with a different serial number;

Receiving a selection signal corresponding to the target sequence number;

The object to be identified corresponding to the target sequence number is determined as the object to be identified.
The method according to claim 2, wherein the determining the target object to be identified contained in the target image comprises:

Displaying the target image;

Receiving a third trigger signal acting on the target image;

An object to be identified in a target area corresponding to the third trigger signal is determined as the target object to be identified.
The method according to claim 2, wherein the performing image recognition on the target to-be-recognized object to obtain the image recognition result comprises:

Image recognition is performed on the target to-be-recognized object through a machine learning model to obtain the image recognition result. The machine learning model is obtained by training a neural network using multiple sets of training sample data. Each set of training sample data includes a sample image and a recognition result corresponding to the sample image.
The method according to any one of claims 1 to 6, wherein after obtaining the image recognition result of the target image, further comprising:

Acquiring related information corresponding to the image recognition result;

Display related information corresponding to the image recognition result.
The method according to claim 7, wherein:

When the image recognition result is a person identification, the related information corresponding to the image recognition result includes one or more of the following combinations: encyclopedia information, social account information, news information information, works of the person corresponding to the person identification information;

When the image recognition result is an item identification, the related information corresponding to the image recognition result includes one or more of the following combinations: encyclopedia information and purchase information of the item corresponding to the item identification.
The method according to claim 7, wherein the displaying related information corresponding to the image recognition result comprises:

When the image recognition result includes multiple records, receiving a selection signal corresponding to the target record;

Display related information corresponding to the target record.
The method according to any one of claims 1 to 6, wherein the displaying a picture recognition control in a sidebar when the video is playing scenes comprises:

Receiving a call-out instruction corresponding to the sidebar while in the video playback scene;

Displaying the sidebar according to the call-out instruction; wherein the sidebar includes the image recognition function control.
A video image recognition device, characterized in that the device includes:

A control display module is used to display the image recognition control in the sidebar when the video is playing.

An image acquisition module, configured to, when receiving a first trigger signal corresponding to the image recognition function control, perform screenshot processing on a current playback screen to obtain a target image;

An image recognition module, configured to obtain an image recognition result of the target image, where the image recognition result is obtained by performing image recognition on the target image;

A result display module, configured to display the image recognition result.
The device according to claim 11, wherein the target image includes a plurality of objects to be identified, and the image recognition module is configured to:

Determining a target to-be-recognized object included in the target image;

Performing image recognition on the target to-be-recognized object to obtain the image recognition result.
The device according to claim 12, wherein the image recognition module is configured to:

Display people recognition controls and / or item recognition controls;

When a second trigger signal corresponding to the person recognition control is received, determining that the target object to be identified is a region in the target image that includes a person image;

When a third trigger signal corresponding to the item identification control is received, it is determined that the target object to be identified is an area in the target image that includes an item image.
The device according to claim 12, wherein the image recognition module is configured to:

Displaying the target image, and each object to be identified in the target image is marked with a different serial number;

Receiving a selection signal corresponding to the target sequence number;

The object to be identified corresponding to the target sequence number is determined as the object to be identified.
The device according to claim 12, wherein the image recognition module is configured to:

Displaying the target image;

Receiving a third trigger signal acting on the target image;

An object to be identified in a target area corresponding to the third trigger signal is determined as the target object to be identified.
The device according to claim 12, wherein the image recognition module is configured to perform image recognition on the target to-be-recognized object through a machine learning model to obtain the image recognition result, and the machine learning model adopts A plurality of sets of training sample data are obtained by training the neural network. Each set of training sample data in the plurality of sets of training sample data includes a sample image and a recognition result corresponding to the sample image.
The device according to any one of claims 11 to 16, wherein the device further comprises:

An information acquisition module, configured to acquire related information corresponding to the image recognition result;

An information display module is configured to display related information corresponding to the image recognition result.
The device according to claim 17, wherein:

When the image recognition result is a person identification, the related information corresponding to the image recognition result includes one or more of the following combinations: encyclopedia information, social account information, news information information, works of the person corresponding to the person identification information;

When the image recognition result is an item identification, the related information corresponding to the image recognition result includes one or more of the following combinations: encyclopedia information and purchase information of the item corresponding to the item identification.
The device according to claim 17, wherein the information display module is configured to:

When the image recognition result includes multiple records, receiving a selection signal corresponding to the target record;

Display related information corresponding to the target record.
The device according to any one of claims 11 to 16, wherein the control display module is configured to:

Receiving a call-out instruction corresponding to the sidebar while in the video playback scene;

Displaying the sidebar according to the call-out instruction; wherein the sidebar includes the image recognition function control.
A terminal, wherein the terminal includes a processor and a memory, and the memory stores a computer program, and the computer program is loaded and executed by the processor to implement the method according to any one of claims 1 to 10. Video recognition method.
A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, and the computer program is loaded and executed by a processor to implement the video according to any one of claims 1 to 10. Image recognition method.