WO2022028177A1

WO2022028177A1 - Information pushing method, video processing method, and device

Info

Publication number: WO2022028177A1
Application number: PCT/CN2021/104450
Authority: WO
Inventors: 崔英林
Original assignee: 上海连尚网络科技有限公司
Priority date: 2020-08-05
Filing date: 2021-07-05
Publication date: 2022-02-10
Also published as: CN111859158A

Abstract

An information pushing method, a video processing method, and a device. The information pushing method comprises: performing data rate conversion on video data, so as to obtain a video stream and identification information of an item that appears in the video stream (201); playing the video stream on a playing device (202); in response to determining that there is an item, which a user is concerned about, in the current video frame of the video stream, determining identification information of the item of concern (203); and on the basis of the identification information of the item of concern, querying pushing information of the item of concern, and presenting the pushing information (204). By means of the method, an item of interest to a user is discovered from numerous items that appear in a video stream, and pushing information of said item is automatically presented, such that the requirement of the user for knowing the information of said item in detail is met, so as to purchase said item, thereby saving on operation costs of the user.

Description

Information push, video processing method and device

technical field

The embodiments of the present application relate to the field of computer technologies, and in particular, to methods and devices for information push and video processing.

Background technique

With the rapid development of the Internet, video applications support more and more diverse functions, such as live broadcast functions, on-demand functions, and so on. In turn, more and more users are attracted to watch the video, and the viewing time is getting longer and longer. Various objects, such as clothes, decorations, food, etc., often appear in videos. If the user is interested in the items in it, he needs to run the video application in the background, then open the search application or shopping application, and enter the name of the item to search to obtain the detailed information of the item.

SUMMARY OF THE INVENTION

The embodiments of the present application propose methods and devices for information push and video processing.

In a first aspect, an embodiment of the present application provides a method for pushing information, including: performing code stream conversion on video data to obtain the video stream and identification information of items appearing in the video stream; playing the video stream on a playback device; There is an item of interest of the user in the current video frame of the video stream, and the identification information of the item of interest is determined; based on the identification information of the item of interest, the push information of the item of interest is queried, and the push information is presented.

In some embodiments, determining that there is an item of interest of the user in the current video frame of the video stream includes: collecting the user's voice information; recognizing the voice information, and determining the name of the item included in the voice information; The occurrence item is matched, and the matched occurrence item is determined as the attention item.

In some embodiments, determining that there is an item of interest of the user in the current video frame of the video stream includes: setting a trigger area in the video frame where the item appears in the video stream; in response to detecting that the user confirms the trigger area of the current video frame, The appearing item corresponding to the confirmed trigger area is determined as the attention item.

In some embodiments, the identification information includes coordinate information; and setting the trigger area in the video frame where the item appears in the video stream includes: setting the area corresponding to the coordinate information as the trigger area.

In some embodiments, the coordinate information is a percentage coordinate; and setting the area corresponding to the coordinate information as the trigger area includes: calculating the present item of the current video frame based on the resolution of the playback device and the percentage coordinates of the present item of the current video frame the lattice coordinates; set the area corresponding to the lattice coordinates as the trigger area.

In some embodiments, based on the resolution of the playback device and the percentage coordinates of the items appearing in the current video frame, the lattice coordinates of the items appearing in the current video frame are calculated, including: if the coordinate system of the percentage coordinates and the screen coordinate system of the playback device Similarly, multiply the horizontal and vertical pixel values of the resolution of the playback device and the horizontal and vertical coordinate values of the percentage coordinates of the items in the current video frame to obtain the lattice coordinates of the items in the current video frame.

In some embodiments, based on the resolution of the playback device and the percentage coordinates of the items appearing in the current video frame, the lattice coordinates of the items appearing in the current video frame are calculated, further comprising: if the coordinate system of the percentage coordinates and the screen coordinates of the playback device If the system is different, convert the coordinate system of the percentage coordinates to obtain the converted percentage coordinates in the screen coordinate system; compare the horizontal pixel value and vertical pixel value of the resolution of the playback device with the horizontal coordinates of the converted percentage coordinates of the items appearing in the current video frame The value and the vertical coordinate value are correspondingly multiplied to obtain the lattice coordinates of the item appearing in the current video frame.

In some embodiments, detecting the trigger area where the user confirms the current video frame includes: if the user touches the trigger area of the current video frame, determining the user confirms the trigger area.

In some embodiments, detecting that the user confirms the trigger area of the current video frame includes: capturing the focus of the user's eyes; and determining the user confirms the trigger area in response to determining that the focus is on the trigger area of the current video frame.

In some embodiments, capturing the focus of the user's eye includes: using a camera of the playback device to emit a light beam to the eye; using a photosensitive material on a screen of the playback device to sense the intensity of the light beam reflected from the eye; and determining the dark spot on the screen based on the light beam intensity point as the focal point.

In a second aspect, an embodiment of the present application provides a video processing method, including: performing item identification on a video stream to determine an item appearing in the video stream; acquiring identification information of the appearing item; adding the identification information of the appearing item to a corresponding video In the frame protocol, video data is generated.

In some embodiments, acquiring the identification information of the appearing item includes: performing position recognition on the video stream to determine the coordinate information of the appearing article; and adding the coordinate information of the appearing article to the identification information of the appearing article.

In some embodiments, performing position recognition on the video stream to determine the coordinate information of the item that appears includes: simulating the pilot video stream on the pilot device; performing position recognition on the video stream to obtain the lattice coordinates of the item that appears; Lattice coordinates, determine the coordinate information of the item appearing.

In some embodiments, determining the coordinate information of the appearing item based on the lattice coordinates of the appearing item, including: comparing the horizontal coordinate value and the vertical coordinate value of the lattice coordinate of the appearing item with the horizontal pixel value and vertical value of the resolution of the pilot device The pixel value is divided correspondingly to get the percentage coordinates of the item appearing.

In some embodiments, for an item that appears in consecutive video frames in a video stream, the identification information added to the video frame protocol of the first-occurring item includes the item name, coordinate information, brief information and/or web page link. The identification information of the video frame protocol in which the item appears includes the item name and coordinate information.

In some embodiments, adding the identification information of the appearing item to the corresponding video frame protocol includes: extending the network abstraction layer information of the corresponding video frame protocol based on the identification information of the appearing article.

In a third aspect, an embodiment of the present application provides an information push device, comprising: a conversion unit configured to perform code stream conversion on video data to obtain the video stream and identification information of items appearing in the video stream; a playback unit configured to to play the video stream on the playback device; the determining unit is configured to determine the identification information of the attention item in response to determining that the user's attention item exists in the current video frame of the video stream; the presenting unit is configured to be based on the attention item identification information , query the push information of the concerned item, and present the push information.

In some embodiments, the determining unit is further configured to: collect the voice information of the user; identify the voice information, and determine the name of the item contained in the voice information; Identified as an item of interest.

In some embodiments, the determining unit includes: a setting subunit configured to set a trigger area in a video frame where an item of appearance of the video stream is located; the determining subunit configured to respond to detecting a trigger of the user confirming the current video frame area, and determine the appearing item corresponding to the confirmed trigger area as the item of interest.

In some embodiments, the identification information includes coordinate information; and the setting subunit includes: a setting module configured to set an area corresponding to the coordinate information as a trigger area.

In some embodiments, the coordinate information is a percentage coordinate; and a setting module, comprising: a calculation submodule, configured to calculate, based on the resolution of the playback device and the percentage coordinates of the occurrence item of the current video frame, the Lattice coordinates; the setting sub-module is configured to set the area corresponding to the lattice coordinates as the trigger area.

In some embodiments, the calculation sub-module is further configured to: if the coordinate system of the percentage coordinates is the same as the screen coordinate system of the playback device, compare the horizontal pixel value and the vertical pixel value of the resolution of the playback device with the appearance item of the current video frame The horizontal coordinate value and the vertical coordinate value of the percentage coordinates are multiplied correspondingly to obtain the lattice coordinates of the items appearing in the current video frame.

In some embodiments, the calculation submodule is further configured to: if the coordinate system of the percentage coordinates is different from the screen coordinate system of the playback device, convert the coordinate system of the percentage coordinates to obtain the converted percentage coordinates in the screen coordinate system; The horizontal and vertical pixel values of the resolution of the device are correspondingly multiplied by the horizontal and vertical coordinate values of the conversion percentage coordinates of the items appearing in the current video frame to obtain the lattice coordinates of the items appearing in the current video frame.

In some embodiments, the determining subunit is further configured to: if the user touches the triggering region of the current video frame, determine that the user confirms the triggering region.

In some embodiments, the determination subunit includes: a capture module configured to capture the focus of the user's eyes; and a determination module configured to determine the user confirms the trigger region in response to determining that the focus is on the trigger region of the current video frame.

In some embodiments, the capture module is further configured to: use the camera of the playback device to emit a light beam to the eye; use a photosensitive material on the screen of the playback device to sense the intensity of the light beam reflected from the eye; determine the dark spot on the screen based on the light beam intensity, as the focus.

In a fourth aspect, an embodiment of the present application provides a video processing device, including: a determining unit configured to perform item identification on a video stream to determine an item appearing in the video stream; an obtaining unit configured to obtain identification information of the appearing item ; The adding unit is configured to add the identification information of the item to the corresponding video frame protocol to generate video data.

In some embodiments, the acquiring unit includes: a determining subunit, configured to perform position recognition on the video stream, and determine coordinate information of the appearing item; and an adding subunit, configured to add the coordinate information of the appearing article to the identification of the appearing article information.

In some embodiments, the determining subunit includes: a pilot-broadcasting module, configured to simulate a pilot-broadcast video stream on a pilot-broadcasting device; an identification module, configured to perform position recognition on the video stream to obtain the lattice coordinates of the appearing item; the determining module, is configured to determine coordinate information of the appearing item based on the lattice coordinates of the appearing article.

In some embodiments, the determining module is further configured to: divide the horizontal coordinate value and the vertical coordinate value of the lattice coordinates of the appearing item and the horizontal pixel value and the vertical pixel value of the resolution of the pilot device correspondingly to obtain the appearance of the article. Percentage coordinates.

In some embodiments, the adding unit is further configured to: extend the network abstraction layer information of the corresponding video frame protocol based on the identification information of the present item.

In a fifth aspect, an embodiment of the present application provides a computer device, the computer device includes: one or more processors; a storage device on which one or more programs are stored; when one or more programs are stored by one or more The processors execute such that one or more processors implement a method as described in any implementation of the first aspect or implement a method as described in any implementation of the second aspect.

In a sixth aspect, an embodiment of the present application provides a computer-readable medium on which a computer program is stored, and when the computer program is executed by a processor, implements the method described in any of the implementation manners in the first aspect or implements the method described in the second aspect. A method as described by any implementation in an aspect.

In the information push, video processing method and device provided by the embodiments of the present application, the video data is first converted into a code stream to obtain the video stream and the identification information of the items appearing in the video stream; then the video stream is played on the playback device; then, in response to determining There is an item of interest of the user in the current video frame of the video stream, and the identification information of the item of interest is determined; finally, based on the identification information of the item of interest, the push information of the item of interest is queried, and the push information is presented. Find the items that the user is interested in from the many items appearing in the video stream, and automatically present the push information, so as to meet the user's need to know the information of the item of interest in detail, so as to facilitate the purchase of the item of interest and save the user's time. operating costs.

Description of drawings

Other features, objects and advantages of the present application will become more apparent by reading the detailed description of non-limiting embodiments made with reference to the following drawings:

FIG. 1 is an exemplary system architecture to which the present application may be applied;

2 is a flowchart of an embodiment of an information push method according to the present application;

Fig. 3 is a flowchart of another embodiment of the information push method according to the present application;

FIG. 4 is a flowchart of another embodiment of an information push method according to the present application;

FIG. 5 is a flowchart of an embodiment of a video processing method according to the present application;

FIG. 6 is a schematic structural diagram of a computer system suitable for implementing the computer device of the embodiment of the present application.

detailed description

The present application will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the related invention, but not to limit the invention. In addition, it should be noted that, for the convenience of description, only the parts related to the related invention are shown in the drawings.

It should be noted that the embodiments in the present application and the features of the embodiments may be combined with each other in the case of no conflict. The present application will be described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.

FIG. 1 shows an exemplary system architecture 100 to which embodiments of the information push and video processing methods of the present application may be applied.

As shown in FIG. 1 , the system architecture 100 may include

devices

101 , 102 and a network 103 . The medium used by the network 103 to provide a communication link between the

devices

101 , 102 . The network 103 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The

devices

101, 102 may be hardware devices or software that support network connections to provide various network services. When the device is hardware, it can be a variety of electronic devices including, but not limited to, smart phones, tablet computers, laptop computers, desktop computers, and servers, among others. At this time, as a hardware device, it can be implemented as a distributed device group composed of multiple devices, or can be implemented as a single device. When the device is software, it can be installed in the electronic devices listed above. At this time, as software, it may be implemented as a plurality of software or software modules for providing distributed services, or may be implemented as a single software or software module. There is no specific limitation here.

In practice, the device can provide corresponding network services by installing a corresponding client application or server application. After the client application is installed on the device, it can be embodied as a client in network communication. Correspondingly, after the server application is installed, it can be embodied as a server in network communication.

As an example, in FIG. 1, device 101 is embodied as a client, and device 102 is embodied as a server. For example, the device 101 may be a client of a video application, and the device 102 may be a server of the video application.

It should be noted that, the information pushing method and the video processing method provided by the embodiments of the present application may be executed by the device 101 . When the device 101 executes the information push method, it may be a playback device. When the device 102 performs the video processing method, it may be a pilot device.

It should be understood that the number of networks and devices in Figure 1 is merely illustrative. There can be any number of networks and devices depending on the implementation needs.

Continue to refer to FIG. 2 , which shows a process 200 of an embodiment of the information push method according to the present application. The information push method includes the following steps:

Step 201: Convert the video data to a code stream to obtain the video stream and the identification information of the items appearing in the video stream.

In this embodiment, the execution body of the information push method (for example, the device 101 shown in FIG. 1 ) may acquire video data from the background server of the video application (for example, the device 102 shown in FIG. 1 ), and encode the video data Stream conversion, to obtain the video stream and the identification information of the items appearing in the video stream.

The video data may include the video stream and the identification information of the items appearing in the video stream. Video streams are playable data, including but not limited to TV series, movies, live broadcasts, short videos, and so on. The identification information of the item appearing in the video stream is unplayable data, which is used to identify the item appearing in the video stream, including but not limited to the item name, coordinate information, brief information, and web page link. Appearing items may be items that appear in the video stream, such as clothing, decorations, food, and the like.

For the video frames in the video stream, not every item appears in the video frame, and not every item that appears has identification information. Therefore, only the video frame where the item with identification information is located is encoded. Modify the video frame protocol and add identification information to the original video frame protocol. Unplayable data is added to the video data and cannot be played directly, so the video data needs to be stream-converted to separate the playable video stream from the unplayable identification information. The code stream conversion may adopt a static transcoding method or a dynamic transcoding method.

It should be noted that when transforming the video frame protocol, the transforming methods are different for different protocol formats. Taking H.264 as an example, adding identification information is supported by extending the NAL (Network Abstraction Layer) information of the video frame protocol. Among them, NAL can include NAL Header, NAL Extension and NAL payload. NAL Header can be used to store basic information of video frames. The NAL payload can be used to store a binary stream of video frames. NAL Extension can be used to store identification information. It should be noted that since the video frame itself is a highly compressed data body, the NAL Extension also needs to have high compression.

In practical applications, the same item can appear in multiple consecutive video frames. For items that appear in consecutive video frames in the video stream, the identification information added to the video frame protocol of the first-appearing item may be detailed information, including item name, coordinate information, brief information and/or web page link, and the corresponding video frame is called Detailed frame; the identification information added to the video frame protocol of a non-first-occurring item can be abbreviated information, including item name and coordinate information, and the corresponding video frame is called an abbreviated frame. In this way, the purpose of saving space can be achieved. In the process of playing the video stream on the playback device, when the detailed frame is played, the detailed information can be decoded and cached, and when the thumbnail frame is played later, if it is detected that there is an item of interest of the user in the current video frame The abbreviated information is cached to query, and the detailed information of the concerned item can be obtained.

Step 202: Play the video stream on the playback device.

In this embodiment, the above-mentioned execution body may play the video stream on the playback device.

Generally, in the case where the above-mentioned execution body is used as hardware, it may be a playback device on which a player is installed for playing the video stream.

It should be noted that the playback device usually plays the video stream while converting the code stream. Therefore, during the playback of the video stream, the identification information of the items appearing in the video stream can be successively obtained.

Step 203 , in response to determining that there is an item of interest of the user in the current video frame of the video stream, determine identification information of the item of interest.

In this embodiment, the above-mentioned execution subject may determine whether there is an item of interest of the user in the current video frame of the video stream. If there is an item of interest of the user, determine the identification information of the item of interest; if there is no item of interest of the user, continue to play the video stream.

Wherein, the user's attention item may be determined by the above-mentioned execution subject based on the user's reaction when watching the video stream. Usually, when a user sees an item of interest, he or she will make a special reaction, for example, when the user's attention item appears in the video stream, the user can say the name of the attention item. At this time, the above-mentioned executive body may collect the voice information of the user, identify the voice information, and determine the name of the item contained in the voice information. If the item name matches the item appearing in the current video frame, the matched appearing item is determined as the item of interest; if the item name does not match the appearing item in the current video frame, continue to collect the user's voice information. The current video frame is the currently playing video frame. Multiple items may appear in the same video frame, and the item that matches the item name contained in the user's voice information is the user's attention item. For example, the user says "watch", and the items appearing in the current video frame include watches of brand A, clothes of brand B, and shoes of brand C. Only the watches of brand A match with "watch" and are the user's attention items.

Step 204 , based on the identification information of the item of interest, query the push information of the item of interest, and present the push information.

In this embodiment, the above-mentioned execution subject may, based on the identification information of the item of interest, query the push information of the item of interest, and present the push information. The push information may be a link for the user to browse the detailed information of the item of interest or a link to purchase the item of interest. Generally, the push information can be presented on the current video frame, especially in the vicinity of the item of interest in the current video frame. Subsequently, the user can perform corresponding operations based on the push information to view the detailed information of the item of interest or purchase the item of interest.

Generally, the above-mentioned executive body can query the push information of the concerned item in various ways. For example, when the push information of a large number of items is stored locally, the push information of the concerned item is searched locally. For another example, in the case where a video application integrates a search function or a shopping function, based on the identification information of the item of interest, a push information acquisition request is sent to the background server of the video application, and the push information of the item of interest returned by the background server of the video application is received. information. For another example, based on the identification information of the item of interest, a push information acquisition request is sent to the background server of the search application or the shopping application, and the push information of the item of interest returned by the background server of the search application or the shopping application is received.

In the information push method provided by the embodiment of the present application, the video data is first converted into a code stream to obtain the video stream and the identification information of the items appearing in the video stream; then the video stream is played on the playback device; then, in response to determining the current video of the video stream There is an item of interest of the user in the frame, and the identification information of the item of interest is determined; finally, based on the identification information of the item of interest, the push information of the item of interest is queried, and the push information is presented. Find the items that the user is interested in from the many items that appear in the video stream, and automatically present the push information, so as to meet the user's need to know the information of the item of interest in detail, realize the fast push of the item of interest, and save the user's operation. cost.

Referring further to FIG. 3 , it shows a process 300 of still another embodiment of the information pushing method according to the present application. The information push method includes the following steps:

Step 301: Convert the video data to the code stream to obtain the video stream and the identification information of the items appearing in the video stream.

Step 302: Play the video stream on the playback device.

In this embodiment, the specific operations of steps 301-302 have been described in detail in steps 201-202 in the embodiment shown in FIG. 2, and are not repeated here.

Step 303, setting a trigger area in the video frame where the item appearing in the video stream is located.

In this embodiment, the execution body of the information push method (for example, the device 101 shown in FIG. 1 ) may set a trigger area in the video frame where the item appears in the video stream.

Typically, the trigger area can be set in the vicinity of the present item in the video frame. For example, in the case that the identification information includes coordinate information, the area corresponding to the coordinate information is set as the trigger area. It should be understood that, when the number of items appearing in the video frame is multiple, multiple trigger areas may be set, and one trigger area corresponds to one item appearing.

Step 304, in response to detecting that the user confirms the trigger area of the current video frame, determine the appearing item corresponding to the confirmed trigger area as the item of interest.

In this embodiment, the above-mentioned execution body can detect whether the user confirms the trigger area of the current video frame. If the trigger area where the user confirms the current video frame is detected, the item corresponding to the confirmed trigger area is determined as the item of interest; if the trigger area where the user confirms the current video frame is not detected, the video stream continues to be played and the detection continues.

Wherein, when the user operates the trigger area, it can be considered that the trigger area is determined. The playback device needs to have corresponding hardware or plug-ins to detect the user's operation on the trigger area, while the video stream itself has no monitoring and network connection capabilities.

In some embodiments, when the playback device has a touch screen, if the user touches the trigger area of the current video frame, it is determined that the user confirms the trigger area.

In some embodiments, when the playback device has a camera, if it is captured that the focus of the user's eyes is in the trigger area of the current video frame, it is determined that the user confirms the trigger area. For example, the above-mentioned executive body may analyze the angle of view of the user's eyes in the user image collected by the camera to determine whether the focus of the user's eyes falls on the trigger area. For another example, in the case where the screen of the playback device is covered with a photosensitive material, the above-mentioned executive body can first use the camera to emit light beams to the eyes of the user; then use the photosensitive material on the screen of the playback device to sense the intensity of the light beam reflected from the eyes; finally A dark spot on the screen is determined based on the beam intensity as the focus of the user's eyes. Among them, when the light beam hits the pupil of the eye, most of the light beam is absorbed by the pupil, so that the intensity of the light beam reflected on the screen is lower and dark spots appear. When the light beam irradiates the part other than the pupil, most of the light beam will be reflected on the screen, the light beam intensity is low, and bright spots appear.

Step 305: Determine the identification information of the object of interest.

Step 306 , based on the identification information of the item of interest, query the push information of the item of interest, and present the push information.

In this embodiment, the specific operations of steps 305-306 have been described in detail in steps 203-204 in the embodiment shown in FIG. 2, and are not repeated here.

As can be seen from FIG. 3 , compared with the embodiment corresponding to FIG. 2 , the process 300 of the information push method in this embodiment highlights the step of determining the user's attention item. Therefore, in the solution described in this embodiment, a trigger area is set in the video frame where the item appears, and the item of interest is determined based on the user's operation on the trigger area, thereby improving the accuracy of determining the item of interest.

Referring further to FIG. 4 , it shows a process 400 of another embodiment of the information pushing method according to the present application. The information push method includes the following steps:

Step 401: Convert the video data to a code stream to obtain the video stream and the identification information of the items appearing in the video stream.

Step 402: Play the video stream on the playback device.

In this embodiment, the specific operations of steps 401-402 have been described in detail in steps 301-302 in the embodiment shown in FIG. 3, and are not repeated here.

Step 403, based on the resolution of the playback device and the percentage coordinates of the items that appear in the current video frame, calculate the lattice coordinates of the items appearing in the current video frame.

In this embodiment, when the coordinate information in the identification information is a percentage coordinate, the execution body of the information push method (for example, the device 101 shown in FIG. 1 ) may be based on the resolution of the playback device and the appearance of the current video frame. The percentage coordinates of , calculate the lattice coordinates of the items appearing in the current video frame.

Since different playback devices have different screen resolutions, in order to adapt to different screen resolutions, the coordinate information in the identification information is a percentage coordinate. However, when determining the trigger area, lattice coordinates are required, so it is necessary to convert the percentage coordinates into the corresponding lattice coordinates. Specifically, the above-mentioned execution body can multiply the horizontal pixel value and vertical pixel value of the resolution of the playback device and the horizontal coordinate value and vertical coordinate value of the percentage coordinate of the item appearing in the current video frame correspondingly to obtain the appearing item in the current video frame. lattice coordinates.

For example, a video stream is played using a playback device with a resolution of A*B. If the percentage coordinates of the appearing items are (x/a, y/b), then the lattice coordinates of the appearing items are (x*A/a, y*B/b). Among them, a, b, A and B are positive integers, x is a positive integer not greater than a, y is a positive integer not greater than b, x/a and y/b are positive numbers not greater than 1, x*A/ a and y*B/b are positive integers.

Usually, the coordinate system of the percentage coordinate is the same as the screen coordinate system of the playback device, both with the upper left corner as the origin, the rightward as the positive direction of the horizontal axis, and the downward as the positive direction of the vertical axis. At this time, directly multiply the horizontal pixel value and vertical pixel value of the resolution of the playback device with the horizontal coordinate value and vertical coordinate value of the percentage coordinate of the item appearing in the current video frame. Lattice coordinates. In special cases, if the coordinate system of the percentage coordinates is different from the screen coordinate system of the playback device, you need to convert the coordinate system of the percentage coordinates first to obtain the converted percentage coordinates in the screen coordinate system; then convert the resolution of the playback device to the The horizontal pixel value and the vertical pixel value are correspondingly multiplied with the horizontal coordinate value and the vertical coordinate value of the conversion percentage coordinate of the item appearing in the current video frame, so as to obtain the lattice coordinates of the appearing item in the current video frame.

Step 404: Set the area corresponding to the lattice coordinates as the trigger area.

In this embodiment, the above-mentioned execution body may set the area corresponding to the lattice coordinates as the trigger area.

Step 405 , in response to detecting that the user confirms the trigger area of the current video frame, determine the appearing item corresponding to the confirmed trigger area as the item of interest.

Step 406: Determine the identification information of the object of interest.

Step 407 , based on the identification information of the item of interest, query the push information of the item of interest, and present the push information.

In this embodiment, the specific operations of steps 405-407 have been described in detail in steps 304-306 in the embodiment shown in FIG. 3, and are not repeated here.

As can be seen from FIG. 4 , compared with the embodiment corresponding to FIG. 3 , the process 400 of the information push method in this embodiment highlights the step of setting a trigger area. Therefore, the coordinate information in the identification information in the solution described in this embodiment is a percentage coordinate, and the corresponding lattice coordinates are obtained through coordinate transformation, thereby adapting to different screen resolutions of different playback devices.

Continuing to refer to FIG. 5, a flow 500 of one embodiment of a video processing method according to the present application is shown. The video processing method includes the following steps:

Step 501: Perform item identification on the video stream to determine the items appearing in the video stream.

In this embodiment, the execution body of the video processing method (for example, the device 101 shown in FIG. 1 ) can perform item identification on the video stream, and determine the items appearing in the video stream.

Generally, the above-mentioned executive body can determine the occurrence items of the video stream in various ways. In some embodiments, those skilled in the art can perform item recognition on the video stream, and input the recognition result to the above-mentioned execution body. In some embodiments, the above-mentioned executive body may split the video stream into a series of video frames, and perform item identification on each video frame to determine the occurrence of items in the video stream.

In step 502, the identification information of the appearing item is acquired.

In this embodiment, the above-mentioned execution subject may acquire the identification information of the appearing item. Wherein, the identification information of the appearing item is unplayable data, which is used to identify the article appearing in the video stream.

In some embodiments, the identification information may include coordinate information. Specifically, the above-mentioned execution body can perform position identification on the video stream, determine the coordinate information of the appearing item, and add the coordinate information of the appearing article to the identification information of the appearing article. The coordinate information may be determined by simulating a pilot video stream on a pilot device. Specifically, a pilot video stream is first simulated on a pilot device; then the location of the video stream is identified to obtain the lattice coordinates of the appearing item; finally, the coordinate information of the appearing item is determined based on the lattice coordinates of the appearing article.

Generally, in the case where the screen resolutions of most of the playback devices and the pilot device are the same, the coordinate information may be lattice coordinates. However, in practical applications, different playback devices have different screen resolutions. In order to adapt to different screen resolutions, the coordinate information in the identification information is a percentage coordinate. Specifically, by dividing the horizontal coordinate value and vertical coordinate value of the lattice coordinates of the appearing item correspondingly with the horizontal pixel value and vertical pixel value of the resolution of the pilot device, the percentage coordinates of the appearing article can be obtained.

For example, using a standard device with a resolution of a*b to pilot a video stream, if the lattice coordinates of the items captured on the pilot device are (x, y), then the percentage coordinates of the items are (x/a, y/b). Among them, a and b are positive integers, x is a positive integer not greater than a, y is a positive integer not greater than b, and x/a and y/b are positive numbers not greater than 1.

It should be noted that the selection of the resolution of the pilot device needs to match the resolution of the video, for example, 16:9 is selected above 720p, and 4:3 is selected below. In this way, the error can be reduced as much as possible.

Step 503 , adding the identification information of the item to the corresponding video frame protocol to generate video data.

In this embodiment, the above-mentioned execution body may add the identification information of the appearing item to the corresponding video frame protocol to generate video data.

Usually, the identification information can be added to the original video frame protocol by performing code stream encoding processing on the video frame where the item with identification information is located, and by transforming the video frame protocol. When transforming the video frame protocol, the transforming methods are different for different protocol formats. Taking H.264 as an example, the NAL information of the corresponding video frame protocol is extended based on the identification information of the item to support adding identification information. Among them, NAL can include NAL Header, NAL Extension and NAL payload. NAL Header can be used to store basic information of video frames. The NAL payload can be used to store a binary stream of video frames. NAL Extension can be used to store identification information. It should be noted that since the video frame itself is a highly compressed data body, the NAL Extension also needs to have high compression.

In the video processing method provided by the embodiment of the present application, firstly, item identification is performed on the video stream to determine the items appearing in the video stream; then the identification information of the appearing items is obtained; finally, the identification information of the appearing items is added to the corresponding video frame protocol to generate video data, thereby adding non-playable data to the video stream.

Referring to FIG. 6 below, it shows a schematic structural diagram of a computer system 600 suitable for implementing a computer device (eg, the device 101 shown in FIG. 1 ) according to an embodiment of the present application. The computer device shown in FIG. 6 is only an example, and should not impose any limitations on the functions and scope of use of the embodiments of the present application.

As shown in FIG. 6, a computer system 600 includes a central processing unit (CPU) 601, which can be loaded into a random access memory (RAM) 603 according to a program stored in a read only memory (ROM) 602 or a program from a storage section 608 Instead, various appropriate actions and processes are performed. In the RAM 603, various programs and data required for the operation of the system 600 are also stored. The CPU 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604 .

The following components are connected to the I/O interface 605: an input section 606 including a keyboard, a mouse, etc.; an output section 607 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker, etc.; a storage section 608 including a hard disk, etc. ; and a communication section 609 including a network interface card such as a LAN card, a modem, and the like. The communication section 609 performs communication processing via a network such as the Internet. A drive 610 is also connected to the I/O interface 605 as needed. A removable medium 611, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is mounted on the drive 610 as needed so that a computer program read therefrom is installed into the storage section 608 as needed.

In particular, according to embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network via the communication portion 609 and/or installed from the removable medium 611 . When the computer program is executed by the central processing unit (CPU) 601, the above-described functions defined in the method of the present application are performed.

It should be noted that the computer-readable medium described in this application may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the above. In this application, a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In this application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for performing the operations of the present application may be written in one or more programming languages, including object-oriented programming languages - such as Java, Smalltalk, C++, but also conventional Procedural programming language - such as "C" language or similar programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or electronic device. Where a remote computer is involved, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.

The units involved in the embodiments of the present application may be implemented in a software manner, and may also be implemented in a hardware manner. The described unit may also be provided in the processor, for example, it may be described as: a processor includes a converting unit, a playing unit, a determining unit and a presenting unit. Among them, the names of these units do not constitute a limitation of the unit itself in this case, for example, the conversion unit can also be described as "converting the video data to a stream to obtain the video stream and the identification information of the items that appear in the video stream. unit". For another example, it can be described as: a processor includes a determination unit, an acquisition unit, and an addition unit. Wherein, the names of these units do not constitute a limitation of the unit itself, for example, the determination unit may also be described as "a unit for identifying items in a video stream and determining items appearing in a video stream".

As another aspect, the present application also provides a computer-readable medium. The computer-readable medium may be included in the computer device described in the above embodiments; it may also exist independently without being assembled into the computer device. middle. The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the computer equipment, the computer equipment is made to perform code stream conversion on the video data to obtain the video stream and the appearance of the video stream. identification information; play the video stream on the playback device; in response to determining that there is an item of interest of the user in the current video frame of the video stream, determine the identification information of the item of interest; based on the identification information of the item of interest, query the push information of the item of interest, and present Push information. Or make the computer equipment: perform item identification on the video stream to determine the item appearing in the video stream; obtain the identification information of the appearing item; add the identification information of the appearing item to the corresponding video frame protocol to generate video data.

The above description is only a preferred embodiment of the present application and an illustration of the applied technical principles. Those skilled in the art should understand that the scope of the invention involved in this application is not limited to the technical solution formed by the specific combination of the above technical features, and should also cover the above technical features or Other technical solutions formed by any combination of its equivalent features. For example, a technical solution is formed by replacing the above-mentioned features with the technical features disclosed in this application (but not limited to) with similar functions.

Claims

An information push method, comprising:

The code stream conversion is performed on the video data to obtain the video stream and the identification information of the occurrence items of the video stream;

playing the video stream on a playback device;

In response to determining that an item of interest of the user exists in the current video frame of the video stream, determining identification information of the item of interest;

Based on the identification information of the item of interest, the push information of the item of interest is queried, and the push information is presented.
The method according to claim 1, wherein the determining that there is an item of interest of the user in the current video frame of the video stream comprises:

collecting the voice information of the user;

Identifying the voice information, and determining the name of the item contained in the voice information;

If the name of the item matches the item appearing in the current video frame, the matching appearing item is determined as the item of interest.
The method according to claim 1, wherein the determining that there is an item of interest of the user in the current video frame of the video stream comprises:

A trigger area is set in the video frame where the appearance item of the video stream is located;

In response to detecting that the user confirms the trigger area of the current video frame, the appearing item corresponding to the confirmed trigger area is determined as the item of interest.
The method of claim 3, wherein the identification information includes coordinate information; and

The setting of the triggering area in the video frame where the item appearing in the video stream is located includes:

The area corresponding to the coordinate information is set as the trigger area.
The method of claim 4, wherein the coordinate information is a percentage coordinate; and

The setting of the area corresponding to the coordinate information as the trigger area includes:

Based on the resolution of the playback device and the percentage coordinates of the items that appear in the current video frame, calculate the lattice coordinates of the items that appear in the current video frame;

The area corresponding to the lattice coordinates is set as the trigger area.
The method according to claim 5, wherein the calculating the lattice coordinates of the appearing articles in the current video frame based on the resolution of the playback device and the percentage coordinates of the appearing articles in the current video frame, comprising:

If the coordinate system of the percentage coordinates is the same as the screen coordinate system of the playback device, compare the horizontal pixel value and vertical pixel value of the resolution of the playback device with the horizontal coordinates of the percentage coordinates of the items appearing in the current video frame The value and the vertical coordinate value are correspondingly multiplied to obtain the lattice coordinates of the item appearing in the current video frame.
The method according to claim 6, wherein the calculating the lattice coordinates of the items appearing in the current video frame based on the resolution of the playback device and the percentage coordinates of the items appearing in the current video frame, further comprising: :

If the coordinate system of the percentage coordinates is different from the screen coordinate system of the playback device, converting the coordinate system of the percentage coordinates to obtain the converted percentage coordinates in the screen coordinate system;

Correspondingly multiply the horizontal pixel value and vertical pixel value of the resolution of the playback device with the horizontal coordinate value and the vertical coordinate value of the conversion percentage coordinates of the occurrence item of the current video frame to obtain the occurrence item of the current video frame. lattice coordinates.
The method according to claim 3, wherein the detecting the triggering area of the user confirming the current video frame comprises:

If the user touches the trigger area of the current video frame, it is determined that the user confirms the trigger area.
The method according to claim 3, wherein the detecting the triggering area of the user confirming the current video frame comprises:

capturing the focus of the user's eye;

In response to determining that the focus is on a trigger region of the current video frame, it is determined that the user confirms the trigger region.
The method of claim 9, wherein the capturing the focus of the user's eyes comprises:

Use the camera of the playback device to emit light beams to the eyes;

Use the photosensitive material on the screen of the playback device to sense the intensity of the light beam reflected from the eyes;

A dark spot on the screen is determined as the focal point based on the beam intensity.
A video processing method, comprising:

Performing item identification on the video stream to determine the items present in the video stream;

obtaining the identification information of the appearing item;

The identification information of the appearing item is added to the corresponding video frame protocol to generate video data.
The method according to claim 11, wherein the obtaining the identification information of the appearing item comprises:

Perform position recognition on the video stream, and determine the coordinate information of the appearing item;

The coordinate information of the appearing item is added to the identification information of the appearing article.
The method according to claim 12, wherein the performing position recognition on the video stream to determine the coordinate information of the appearing item comprises:

Simulate and pilot the video stream on the pilot device;

Perform position recognition on the video stream to obtain the lattice coordinates of the item that appears;

Based on the lattice coordinates of the appearing article, coordinate information of the appearing article is determined.
The method according to claim 13, wherein the determining the coordinate information of the appearing article based on the lattice coordinates of the appearing article comprises:

The horizontal coordinate value and the vertical coordinate value of the lattice coordinates of the appearing item are divided correspondingly with the horizontal pixel value and the vertical pixel value of the resolution of the pilot equipment to obtain the percentage coordinates of the appearing article.
The method according to claim 12, wherein, for an item appearing in consecutive video frames in the video stream, the identification information added to the video frame protocol of the first appearing item includes the item name, coordinate information, brief information and/or Web link, the identification information added to the video frame protocol of non-first-occurrence items includes item name and coordinate information.
The method according to any one of claims 11-15, wherein the adding the identification information of the appearing item to the corresponding video frame protocol comprises:

The network abstraction layer information of the corresponding video frame protocol is extended based on the identification information of the appearing item.
A computer device comprising:

one or more processors;

a storage device on which one or more programs are stored;

The one or more programs, when executed by the one or more processors, cause the one or more processors to implement a method as claimed in any one of claims 1-10, or to implement a method as claimed in claim 11- The method of any of 16.
A computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the method according to any one of claims 1-10, or implements any one of claims 11-16 the method described.