CN111292280B

CN111292280B - Method and device for outputting information

Info

Publication number: CN111292280B
Application number: CN202010064579.3A
Authority: CN
Inventors: 李立强; 包英泽; 亢乐
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-01-20
Filing date: 2020-01-20
Publication date: 2023-08-29
Anticipated expiration: 2040-01-20
Also published as: CN111292280A

Abstract

Embodiments of the present disclosure disclose methods and apparatus for outputting information. One embodiment of the method comprises the following steps: detecting at least one region for displaying information to be recommended in the video and calculating perspective matrixes of the regions; determining the duration time of each region and the size of each region; filtering out regions having a duration of less than a predetermined time and/or a size of less than a predetermined value; for each region, matching and scoring are carried out on each piece of information to be recommended in the information set to be recommended, and the information to be recommended with the highest score is selected as target information of the region; and for each region, overlapping the region with the target information of the region according to the perspective matrix of the region, and outputting after performing post-processing on the image overlapping effect. This embodiment enables an organic fusion of advertisements to be inserted with short video, thereby improving advertisement viewing volume and reducing user complaints.

Description

Method and device for outputting information

Technical Field

Embodiments of the present disclosure relate to the field of computer technology, and in particular, to a method and apparatus for outputting information.

Background

The image analysis technology is based on a visual analysis technology, and solves the common problems based on images, such as face recognition, license plate recognition and the like. In the process of continuously developing the internet technology, the flow cost is continuously reduced, and the high-flow media such as video is popularized.

Meanwhile, the development of advertisements is also changing, mainly including image and video modes, and the advertisements are rigidly inserted into the content browsed by the user in a fixed time. There are three schemes for inserting advertisements in the prior art:

1. after the user browses several content videos, a complete advertisement video, such as an advertisement of a promotional video of a game, is inserted. The user may choose to view or turn off directly without viewing.

2. The users are forced to watch, paid users have no advertisements, and after the complete advertisement playing is finished, the video content watched by the users is continuously played. Similar to traditional television series advertisements, a lot of advertisements need to be seen before watching. The viewer also has no way to skip advertisements or can skip advertisements only through paid portals.

3. Soft advertising is commonly used in movies and has high manufacturing costs. The experience would be better than both schemes 1, 2, but at a higher cost.

In the three advertisement technical schemes corresponding to 1-3, the defects are as follows:

1. the advertisement browsing amount is low, so that the probability that the user views the whole advertisement video is low.

2. Experience is poor, if there are many advertisements, the user is given a feeling of boredom, tiredness, and complaints about too many advertisements.

3. Good experience, but too high cost, is not suitable for use in short video or television shows with low manufacturing costs.

Disclosure of Invention

Embodiments of the present disclosure propose methods and apparatus for outputting information.

In a first aspect, embodiments of the present disclosure provide a method for outputting information, comprising: detecting at least one region for displaying information to be recommended in the video and calculating perspective matrixes of the regions; determining the duration time of each region and the size of each region; filtering out regions having a duration of less than a predetermined time and/or a size of less than a predetermined value; for each region, matching and scoring are carried out on each piece of information to be recommended in the information set to be recommended, and the information to be recommended with the highest score is selected as target information of the region; and for each region, overlapping the region with the target information of the region according to the perspective matrix of the region, and outputting after performing post-processing on the image overlapping effect.

In some embodiments, detecting at least one region in a video for presenting information to be recommended includes: extracting video frames in the video at predetermined time intervals; detecting at least one area for displaying information to be recommended from the extracted video frames through a pre-trained target detection model, wherein the detected target comprises at least one of the following: traffic sign, advertising board, car front, car side.

In some embodiments, after computing the perspective matrix for each region, the method further comprises: and carrying out noise evaluation and processing on the calculated perspective matrix.

In some embodiments, matching scoring with each of the information to be recommended in the set of information to be recommended includes: and executing the following operations on any information set to be recommended in the information sets to be recommended: respectively matching the duration time period, the region size and the theme of the video frame to which the region belongs with the duration time, the picture size and the information content of the information to be recommended to obtain different types of matching degrees; and taking the weighted sum of the matching degrees of different types as the score of the information to be recommended.

In some embodiments, overlapping the region with the target information of the region according to the perspective matrix of the region and performing post-processing of the image overlapping effect, including: multiplying the target information of the region by a perspective matrix to obtain rotated information; the information after rotation is superimposed on the area after the burr is removed.

In some embodiments, the method further comprises: and rendering the video frame with the target information superimposed.

In a second aspect, embodiments of the present disclosure provide an apparatus for outputting information, comprising: the detection unit is configured to detect at least one region for displaying information to be recommended in the video and calculate a perspective matrix of each region; a determining unit configured to determine a duration period of each region and a size of each region; a filtering unit configured to filter out areas having a duration of less than a predetermined time and/or a size of less than a predetermined value; the scoring unit is configured to perform matching scoring on each region and each piece of information to be recommended in the information set to be recommended, and select the information to be recommended with the highest score as target information of the region; and the superposition unit is configured to superpose the region with the target information of the region according to the perspective matrix of the region for each region and output the region after post-processing of the image superposition effect.

In some embodiments, the detection unit is further configured to: extracting video frames in the video at predetermined time intervals; detecting at least one area for displaying information to be recommended from the extracted video frames through a pre-trained target detection model, wherein the detected target comprises at least one of the following: traffic sign, advertising board, car front, car side.

In some embodiments, after computing the perspective matrix for each region, the apparatus further comprises a noise processing unit configured to: and carrying out noise evaluation and processing on the calculated perspective matrix.

In some embodiments, the scoring unit is further configured to: and executing the following operations on any information set to be recommended in the information sets to be recommended: respectively matching the duration time period, the region size and the theme of the video frame to which the region belongs with the duration time, the picture size and the information content of the information to be recommended to obtain different types of matching degrees; and taking the weighted sum of the matching degrees of different types as the score of the information to be recommended.

In some embodiments, the superimposing unit is further configured to: multiplying the target information of the region by a perspective matrix to obtain rotated information; the information after rotation is superimposed on the area after the burr is removed.

In some embodiments, the apparatus further comprises a rendering unit configured to: and rendering the video frame with the target information superimposed.

In a third aspect, embodiments of the present disclosure provide an electronic device, comprising: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors, cause the one or more processors to implement the method as in any of the first aspects.

In a fourth aspect, embodiments of the present disclosure provide a computer readable medium having a computer program stored thereon, wherein the program when executed by a processor implements a method as in any of the first aspects.

The embodiment of the disclosure provides a method and a device for outputting information, which detect the position of an area in which advertisements can be inserted in a video and the position of a time zone through a video analysis technology. And then organically fusing the advertisement to be inserted with the short video through an image processing method, so that better advertisement browsing experience is achieved.

For example, a van which runs in a short video detects the plane problem of the cargo box through a depth network model, calculates a perspective matrix, converts advertisements to the side face of the cargo box according to the perspective matrix, and adds image superposition post-processing, so that a short video without advertisements automatically adds similar software advertisements. The experience is much better than traditional advertising experience and the cost is not significantly increased.

Drawings

Other features, objects and advantages of the present disclosure will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the following drawings:

FIG. 1 is an exemplary system architecture diagram in which an embodiment of the present disclosure may be applied;

FIG. 2 is a flow chart of one embodiment of a method for outputting information according to the present disclosure;

3a-3c are schematic diagrams of one application scenario of a method for outputting information according to the present disclosure;

FIG. 4 is a flow chart of yet another embodiment of a method for outputting information according to the present disclosure;

FIG. 5 is a schematic structural diagram of one embodiment of an apparatus for outputting information according to the present disclosure;

fig. 6 is a schematic diagram of a computer system suitable for use in implementing embodiments of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.

It should be noted that, without conflict, the embodiments of the present disclosure and features of the embodiments may be combined with each other. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the methods of the present disclosure for outputting information or apparatuses for outputting information may be applied.

As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as a video player, a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 101, 102, 103.

The terminal devices 101, 102, 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices having a display screen and supporting video playback, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic video expert compression standard audio plane 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic video expert compression standard audio plane 4) players, laptop and desktop computers, and the like. When the terminal devices 101, 102, 103 are software, they can be installed in the above-listed electronic devices. Which may be implemented as multiple software or software modules (e.g., to provide distributed services), or as a single software or software module. The present invention is not particularly limited herein.

The server 105 may be a server providing various services, such as a background recommendation server for inserting advertisements into videos displayed on the terminal devices 101, 102, 103. The background recommendation server can analyze and other processes on the received video playing request and advertisement data provided by the advertiser, and feed back the video inserted with the advertisement to the terminal equipment.

The server may be hardware or software. When the server is hardware, the server may be implemented as a distributed server cluster formed by a plurality of servers, or may be implemented as a single server. When the server is software, it may be implemented as a plurality of software or software modules (e.g., a plurality of software or software modules for providing distributed services), or as a single software or software module. The present invention is not particularly limited herein.

It should be noted that, the method for outputting information provided by the embodiments of the present disclosure is generally performed by the server 105, and accordingly, the apparatus for outputting information is generally provided in the server 105.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to fig. 2, a flow 200 of one embodiment of a method for outputting information according to the present disclosure is shown. The method for outputting information comprises the following steps:

step 201, detecting at least one region in the video for displaying information to be recommended and calculating perspective matrixes of the regions.

In this embodiment, the execution subject (e.g., the server shown in fig. 1) of the method for outputting information may insert information to be recommended in an already authorized video directly according to the needs of an advertiser. The information to be recommended may be information such as advertisements in the form of pictures or videos. The server may also set an advertisement mode according to the authority of the user after receiving the video play request of the user, and then select an area for inserting advertisements according to the advertisement mode. For example, the super member does not insert an advertisement, the general member inserts a 5-minute advertisement, etc.

The area for displaying the information to be recommended is the advertisement space. Except that the background location in the video can be detected as an advertisement spot by a general background detection algorithm. Specific targets may also be detected as ad spots by pre-trained target detection models, including but not limited to at least one of the following: traffic sign, advertising board, car front, car side. The target detection model may be a neural network model commonly used in the prior art, such as fast R-CNN, SSD, and YOLO, and will not be described herein.

In order to improve the detection efficiency, video frames in the video can be extracted at predetermined time intervals, and then at least one area for displaying information to be recommended is detected from the extracted video frames through a pre-trained target detection model. For example, video frames are extracted in interval 3s from which trucks are detected. The start time of the first occurrence of the truck is detected from the preceding video frames of the consecutive occurrence of the truck and the end time of the last occurrence of the truck is detected from the following video frames of the consecutive occurrence of the truck, again at a finer granularity after the detection of the truck.

Since the detected object is originally a rectangular plane, it appears as a trapezoid due to the photographing angle. Trapezoid = rectangle x perspective matrix. Each corner of the trapezoid can be detected, four corners of the rectangle before transformation can be predicted, and then the perspective matrix can be calculated. We say that the perspective matrix is in fact a 3x3 dimensional matrix, and that the image after its transformation (i.e. the image pixel matrix is multiplied by the perspective matrix) can exhibit various perspective and affine effects. Perspective transformation can magnify, shrink, rotate, or otherwise change the image into a parallelogram or trapezoid. The quadrilateral can be detected through the target detection model, 4 corner points of the quadrilateral are obtained, and the perspective matrix with the minimum error is estimated. The trapezoid shape is not any quadrangle but meets the rule of the perspective matrix, and if the trapezoid shape does not meet the rule of the perspective matrix transformation, the trapezoid shape is considered to be noisy. And calculating the error between the theoretical value and the actual value according to the perspective matrix transformation rule. The steps are repeated to detect the same target in different frames, and the perspective matrix of the multiple frames eliminates certain errors. Examples of rectangular areas are:

in a period of time, such as 7 seconds, each frame of image can be detected through the target detection model, and the error of the detected region and the theoretical rectangle under perspective transformation can be calculated. I.e. noise.

Noise processing may be error corrected by frame-to-frame detection, e.g., one point at three frames with values 1,3,1, then we can correct the value of the second frame to 1.

Step 202, determining the duration of each region and the size of each region.

In the present embodiment, the start time and the end time of the advertisement slot in which the same object (e.g., a car) is continuously detected in step 201 are determined as the duration period of the advertisement slot. The size of each area can be determined while the target detection is carried out, and the sizes of advertisement spots of the same object in different continuous frames can be different, so that the minimum area and the minimum side length can be used as the area and the side length of the advertisement spots. In addition to the area, the side length of the area can also be measured. The size of the area here refers not only to the area but also to the side length.

In step 203, regions having a duration less than a predetermined time and/or a size less than a predetermined value are filtered out.

In this embodiment, it is meaningless that the ad spot duration is too short, since the human visual perception does not feel too short an advertisement. It is therefore necessary to filter out areas with a duration less than a predetermined time, i.e. not to take them as ad spots. In addition, areas with too small an area are not noticeable and are therefore filtered out. In addition to the area, it is also considered to filter out areas with a side length that is too short.

And 204, for each region, performing matching scoring on the region and each piece of information to be recommended in the information set to be recommended, and selecting the information to be recommended with the highest score as target information of the region.

In this embodiment, different advertisements may be inserted into a video segment. Each advertisement slot can be matched and scored with each information to be recommended in the information set to be recommended. A score may be made, such as how well the information content of the information to be recommended matches the subject of the video frame to which the ad spot belongs. For example, if the topic of the video is child education, the matching degree with the hops advertisement is set to be the lowest 0. The subject matter of the video may be determined by information such as a video presentation or a tag.

Scoring may also be done from multiple angles and then comprehensively considered. Respectively matching the duration time period, the region size and the theme of the video frame to which the region belongs with the duration time, the picture size and the information content of the information to be recommended to obtain different types of matching degrees; and taking the weighted sum of the matching degrees of different types as the score of the information to be recommended. The degree of matching of the picture size is inversely related to the scale by which the picture is scaled when inserted into the ad slot, e.g. the degree of matching of the picture size may be 1-scale, if no scaling is used, the degree of matching is 100%. If the duration of the information to be recommended is less than or equal to the duration of the advertisement slot, the ratio of the duration of the information to be recommended to the duration of the advertisement slot can be used as the matching degree of the duration of the information to be recommended. For example, for an ad video, if the duration of a particular ad spot is greater than the duration of the ad video, if it is less than, the ad video cannot be inserted on the ad spot. For a picture advertisement, if the size of the best display effect of the picture advertisement is much smaller than the size of the advertisement spot, it is not suitable to put the picture advertisement in the advertisement spot even if the picture advertisement is stretched. Similarly, if the ad spot is much smaller than the picture ad, it is not necessary to narrow down to the point where details are not visible and reinsert the ad spot.

And 205, for each region, overlapping the target information of the region according to the perspective matrix of the region, and outputting after performing post-processing of the image overlapping effect.

In this embodiment, the advertiser generally provides the hollowed-out information to be recommended, and the hollowed-out information to be recommended can be directly superimposed on the advertisement space selected in step 204 after the hollowed-out information to be recommended is transformed by the perspective matrix. Optionally, a scaling process may also be performed to match the size of the targeted information to the ad spot. If the advertiser does not provide hollowed-out information to be recommended, the server can also perform image processing, reserve effective fonts, and perform hollowed-out processing on the background to generate hollowed-out information to be recommended. The target information can also be advertisement video, and the advertisement video can be stacked on advertisement positions of each frame of video one by one after being decomposed into advertisement pictures.

The video frames after the advertisement insertion can be formed into video and stored in a video library to be sent to the user terminal when the user requests to play the video. Or inserts advertisements while playing when the user requests to play the video. Manual auditing can also be carried out, and the audited video can be stored in a video library, and when a user requests to play the video, the video inserted with the advertisement is sent to the user terminal. The predetermined time and the predetermined value may be adjusted to generate different versions of the advertising video and then different advertising videos may be transmitted thereto according to the identity of the user. For example, a super member only sends video containing 1 minute advertisement. The non-member transmits a video containing a 10 minute advertisement.

With continued reference to fig. 3a-3c, fig. 3a-3c are a schematic illustration of an application scenario of the method for outputting information according to the present embodiment. Fig. 3a is a hollowed-out advertising picture to be inserted. The video frames including the advertisement spots are detected from the video through a target detection model (automobile), and the video frames including the advertisement spots shown in fig. 3b are obtained after the filtering of the conditions of duration, size and the like. A perspective matrix is then calculated from the four corners of the detected vehicle. The advertising picture of fig. 3a is transformed into a trapezoid through the perspective matrix and then is superimposed on the advertising spot, forming the video frame shown in fig. 3 c. And inserting proper advertisement pictures into each advertisement position to generate advertisement-inserted video.

The method provided by the embodiment of the disclosure can bring the advertisement scheme with better experience to advertisers and users, and the advertisement scheme of the soft advertisement is brought into short videos and images by adopting the depth model detection capability of artificial intelligence.

With further reference to fig. 4, a flow 400 of yet another embodiment of a method for outputting information is shown. The flow 400 of the method for outputting information comprises the steps of:

step 401, detecting at least one region in the video for displaying information to be recommended and calculating perspective matrixes of the regions.

Step 402, determining a duration of each region and a size of each region.

In step 403, regions having a duration of less than a predetermined time and/or a size of less than a predetermined value are filtered out.

And step 404, for each region, matching and scoring with each piece of information to be recommended in the information set to be recommended, and selecting the information to be recommended with the highest score as the target information of the region.

Steps 401-404 are substantially identical to steps 201-204 and are therefore not described in detail.

Step 405, for each region, multiplying the target information of the region by the perspective matrix to obtain rotated information, and removing burrs of the rotated information and then superposing the rotated information on the region.

In this embodiment, when processing pictures or making patterns, many patterns have smooth edges just when they are made, and saw teeth (burrs) appear when they are slightly deformed or reduced. The glitch can be removed by morphological processing (binary image), filtering, etc. in the prior art.

And step 406, rendering the video frame with the target information superimposed thereon and outputting the video frame.

In the present embodiment, image rendering is a process of converting a three-dimensional light energy transfer process into one two-dimensional image. The scene and the entity are represented in three-dimensional form, and are closer to the real world, so that the scene and the entity are convenient to manipulate and transform. The rendered video frames can be formed into videos and then manually checked, and the checked video frames can be stored in a video library, and when a user requests to play the videos, the videos inserted with advertisements are sent to the user terminal. The predetermined time and the predetermined value may be adjusted to generate different versions of the advertising video and then different advertising videos may be transmitted thereto according to the identity of the user. For example, a super member only sends video containing 1 minute advertisement. The non-member transmits a video containing a 10 minute advertisement.

With further reference to fig. 5, as an implementation of the method shown in the foregoing figures, the present disclosure provides an embodiment of an apparatus for outputting information, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.

As shown in fig. 5, the apparatus 500 for outputting information of the present embodiment includes: a detection unit 501, a determination unit 502, a filtering unit 503, a scoring unit 504, and a superposition unit 505. Wherein, the detection unit 501 is configured to detect at least one region for displaying information to be recommended in the video and calculate a perspective matrix of each region; a determining unit 502 configured to determine a duration period of each region and a size of each region; a filtering unit 503 configured to filter out areas having a duration of less than a predetermined time and/or a size of less than a predetermined value; a scoring unit 504 configured to perform matching scoring with each information to be recommended in the information set to be recommended for each region, and select the information to be recommended with the highest score as target information of the region; and a superimposing unit 505 configured to superimpose, for each region, the region on the target information of the region according to the perspective matrix of the region and perform post-processing of the image superimposing effect and output.

In the present embodiment, specific processes of the detection unit 501, the determination unit 502, the filtering unit 503, the scoring unit 504, and the superimposing unit 505 of the apparatus 500 for outputting information may refer to steps 201, 202, 203, 204, and 205 in the corresponding embodiment of fig. 2.

In some optional implementations of the present embodiment, the detection unit 501 is further configured to: extracting video frames in the video at predetermined time intervals; detecting at least one area for displaying information to be recommended from the extracted video frames through a pre-trained target detection model, wherein the detected target comprises at least one of the following: traffic sign, advertising board, car front, car side.

In some optional implementations of the present embodiment, after calculating the perspective matrix of each region, the apparatus 500 further includes a noise processing unit (not shown in the drawings) configured to: and carrying out noise evaluation and processing on the calculated perspective matrix.

In some optional implementations of the present embodiment, the scoring unit 504 is further configured to: and executing the following operations on any information set to be recommended in the information sets to be recommended: respectively matching the duration time period, the region size and the theme of the video frame to which the region belongs with the duration time, the picture size and the information content of the information to be recommended to obtain different types of matching degrees; and taking the weighted sum of the matching degrees of different types as the score of the information to be recommended.

In some optional implementations of the present embodiment, the superposition unit 505 is further configured to: multiplying the target information of the region by a perspective matrix to obtain rotated information; the information after rotation is superimposed on the area after the burr is removed.

In some optional implementations of the present embodiment, the apparatus 500 further includes a rendering unit (not shown in the drawings) configured to: and rendering the video frame with the target information superimposed.

Referring now to fig. 6, a schematic diagram of an electronic device (e.g., server in fig. 1) 600 suitable for use in implementing embodiments of the present disclosure is shown. The server illustrated in fig. 6 is merely an example, and should not be construed as limiting the functionality and scope of use of the embodiments of the present disclosure in any way.

As shown in fig. 6, the electronic device 600 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 601, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data required for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

In general, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, magnetic tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 shows an electronic device 600 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead. Each block shown in fig. 6 may represent one device or a plurality of devices as needed.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via communication means 609, or from storage means 608, or from ROM 602. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the processing means 601. It should be noted that, the computer readable medium according to the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In an embodiment of the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. Whereas in embodiments of the present disclosure, the computer-readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave, with computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: detecting at least one region for displaying information to be recommended in the video and calculating perspective matrixes of the regions; determining the duration time of each region and the size of each region; filtering out regions having a duration of less than a predetermined time and/or a size of less than a predetermined value; for each region, matching and scoring are carried out on each piece of information to be recommended in the information set to be recommended, and the information to be recommended with the highest score is selected as target information of the region; and for each region, overlapping the region with the target information of the region according to the perspective matrix of the region, and outputting after performing post-processing on the image overlapping effect.

Computer program code for carrying out operations of embodiments of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments described in the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The described units may also be provided in a processor, for example, described as: a processor includes a detection unit, a determination unit, a filtering unit, a scoring unit, and a superposition unit. The names of these units do not constitute a limitation on the unit itself in some cases, and for example, the detection unit may also be described as "a unit for detecting at least one region in a video for presenting information to be recommended and calculating a perspective matrix of each region".

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention referred to in this disclosure is not limited to the specific combination of features described above, but encompasses other embodiments in which any combination of features described above or their equivalents is contemplated without departing from the inventive concepts described. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).

Claims

1. A method for outputting information, comprising:

after receiving a video playing request of a user, setting an advertisement mode according to the authority of the user, and then selecting whether to insert advertisements or not according to the advertisement mode;

if advertisements are inserted, video frames in the video are extracted at preset time intervals, at least one region for displaying information to be recommended in the video is detected, perspective matrixes of all the regions are calculated, if the detected quadrangle does not meet the perspective matrix transformation rule, the same target is detected in different frames to calculate errors of the detected region and the theoretical rectangle under perspective transformation, and error correction is carried out through region detection between the frames;

detecting a region for displaying information to be recommended, then detecting the starting time of the region from the previous video frame of the region continuously appearing in a finer granularity, detecting the ending time of the region from the subsequent video frame of the region continuously appearing in the last time, and determining the duration time of each region and the size of each region;

filtering out regions having a duration of less than a predetermined time and/or a size of less than a predetermined value;

for each region, matching and scoring are carried out on each piece of information to be recommended in the information set to be recommended, and the information to be recommended with the highest score is selected as target information of the region;

and for each region, overlapping the region with the target information of the region according to the perspective matrix of the region, and outputting after performing post-processing on the image overlapping effect.

2. The method of claim 1, the detecting at least one region in a video for showing information to be recommended, comprising:

detecting at least one area for displaying information to be recommended from the extracted video frames through a pre-trained target detection model, wherein the detected target comprises at least one of the following: traffic sign, advertising board, car front, car side.

3. The method of claim 1, after said computing the perspective matrix for each region, the method further comprising:

and carrying out noise evaluation and processing on the calculated perspective matrix.

4. The method of claim 1, wherein the scoring for matching with each information to be recommended in the set of information to be recommended comprises:

and executing the following operations on any information set to be recommended in the information sets to be recommended:

respectively matching the duration time period, the region size and the theme of the video frame to which the region belongs with the duration time, the picture size and the information content of the information to be recommended to obtain different types of matching degrees;

and taking the weighted sum of the matching degrees of different types as the score of the information to be recommended.

5. The method according to claim 1, wherein the superimposing the region on the target information of the region according to the perspective matrix of the region and performing the post-processing of the image superimposing effect includes:

multiplying the target information of the region by a perspective matrix to obtain rotated information;

and removing burrs of the rotated information and then superposing the information on the area.

6. The method according to one of claims 1-5, the method further comprising:

and rendering the video frame with the target information superimposed.

7. An apparatus for outputting information, comprising:

the detection unit is configured to set an advertisement mode according to the authority of a user after receiving a video playing request of the user, and then select whether to insert advertisements according to the advertisement mode; if advertisements are inserted, video frames in the video are extracted at preset time intervals, at least one region for displaying information to be recommended in the video is detected, perspective matrixes of all the regions are calculated, if the detected quadrangle does not meet the perspective matrix transformation rule, the same target is detected in different frames to calculate errors of the detected region and the theoretical rectangle under perspective transformation, and error correction is carried out through region detection between the frames;

a determining unit configured to detect a region for presenting information to be recommended, then detect a start time of the region from a preceding video frame in which the region continuously appears with finer granularity, and detect an end time of the region in which the region finally appears from a subsequent video frame in which the region continuously appears, and determine a duration of each region and a size of each region;

a filtering unit configured to filter out areas having a duration of less than a predetermined time and/or a size of less than a predetermined value;

the scoring unit is configured to perform matching scoring on each region and each piece of information to be recommended in the information set to be recommended, and select the information to be recommended with the highest score as target information of the region;

and the superposition unit is configured to superpose the region with the target information of the region according to the perspective matrix of the region for each region and output the region after post-processing of the image superposition effect.

8. The apparatus of claim 7, the detection unit further configured to:

9. The apparatus of claim 7, after the computing of the perspective matrix for each region, the apparatus further comprising a noise processing unit configured to:

10. The apparatus of claim 7, the scoring unit further configured to:

11. The apparatus of claim 7, the superposition unit further configured to:

12. The apparatus according to one of claims 7-11, the apparatus further comprising a rendering unit configured to:

and rendering the video frame with the target information superimposed.

13. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-6.

14. A computer readable medium having stored thereon a computer program, wherein the program when executed by a processor implements the method of any of claims 1-6.