CN114913443A

CN114913443A - Target object detection method and system, image processing method and product detection method

Info

Publication number: CN114913443A
Application number: CN202110176706.3A
Authority: CN
Inventors: 王强; 郑赟; 潘攀; 徐盈辉
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2021-02-09
Filing date: 2021-02-09
Publication date: 2022-08-16

Abstract

The application discloses a target object detection method and system, an image processing method and a product detection method. Wherein, the method comprises the following steps: acquiring a current image and a reference image; fusing image information of a current image and image information of a reference image to obtain fused image information; and detecting based on the fused image information to obtain a detection result of the target object in the current image. The method and the device solve the technical problem that the accuracy rate of detecting the target object is low in the related technology.

Description

Target object detection method and system, image processing method and product detection method

Technical Field

The present application relates to the field of image processing, and in particular, to a method and a system for detecting a target object, an image processing method, and a product detection method.

Background

In the C2C (person-to-person e-commerce) live e-commerce platform, it is usually necessary to locate and identify the goods in the video for the subsequent recommendation algorithm. When the seller partially shields the commodity in the commodity display process, the detection rate is reduced by adopting a common target tracking algorithm, which has a great negative effect on the commodity selling efficiency and the advertisement putting of the seller; however, in order to increase the detection rate of the product, the algorithm in the related art occupies a large amount of video memory, and the calculation amount is also large.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the application provides a target object detection method and system, an image processing method and a product detection method, and aims to at least solve the technical problem that the accuracy rate of detecting a target object in the related technology is low.

According to an aspect of an embodiment of the present application, there is provided a target object detection method, including: acquiring a current image and a reference image; fusing image information of a current image and image information of a reference image to obtain fused image information; and detecting based on the fused image information to obtain a detection result of the target object in the current image.

According to another aspect of the embodiments of the present application, there is also provided a method for detecting a target object, including: receiving a current image and a reference image; fusing image information of a current image and image information of a reference image to obtain fused image information; detecting based on the fused image information to obtain a detection result of a target object in the current image; and outputting a detection result.

According to another aspect of the embodiments of the present application, there is also provided a method for detecting a target object, including: acquiring video data of a target object; intercepting video data to obtain a current image and a reference image; fusing image information of a current image and image information of a reference image to obtain fused image information; and detecting based on the fused image information to obtain a detection result of the target object in the current image.

According to another aspect of the embodiments of the present application, there is also provided a method for detecting a target object, including: in the video live broadcast process, displaying video data in a playing interface, wherein the video data comprises a target object; fusing image information of a current image and image information of a reference image to obtain fused image information; and marking a target object in the video playing interface, wherein the target object is obtained by detecting based on the fused image information.

According to another aspect of the embodiments of the present application, there is also provided an image processing method, including: acquiring a current image and a reference image; and fusing the image information of the current image and the image information of the reference image to obtain fused image information, wherein the fused image information is used for determining a target object in the current image.

According to another aspect of the embodiments of the present application, there is also provided an image processing method, including: receiving a current image and a reference image, wherein the current image and the reference image both comprise a target object; fusing image information of a current image and image information of a reference image to obtain fused image information, wherein the fused image information is used for determining a target object in the current image; and outputting the fused image information.

According to another aspect of the embodiments of the present application, there is also provided a product detection method, including: acquiring a current image and a reference image of a product; fusing image information of a current image and image information of a reference image to obtain fused image information; and detecting based on the fused image information to obtain a detection result of the product.

According to another aspect of the embodiments of the present application, a computer-readable storage medium is further provided, where the computer-readable storage medium includes a stored program, and when the program runs, the apparatus where the computer-readable storage medium is located is controlled to execute the above-mentioned target object detection method or the above-mentioned image processing method.

According to another aspect of the embodiments of the present application, there is also provided a computer terminal, including a memory and a processor, where the processor is configured to execute a program stored in the memory, where the program executes the method for detecting a target object described above, or the method for processing an image described above.

According to another aspect of the embodiments of the present application, there is also provided a target object detection system, including: a processor; and a memory coupled to the processor for providing instructions to the processor for processing the following processing steps: acquiring a current image and a reference image; fusing image information of a current image and image information of a reference image to obtain fused image information; and detecting based on the fused image information to obtain a detection result of the target object in the current image.

In the embodiment of the application, after the current image and the reference image are obtained, the image information of the current image and the image information of the reference image can be fused to obtain fused image information; and detecting based on the fused image information to obtain a detection result of the target object in the current image. Compared with the prior art, the image information of the current image and the reference image is fused, and the related information of the target object in the current image can be perfected by utilizing the image information of the reference image, so that the accuracy of the detection result of the target object in the current image can be improved by detecting the fused image information, the lower accuracy of the detection result of the target object caused by incomplete information of the target object in the current image is avoided, and the technical problem of lower accuracy of the detection of the target object in the related art is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a block diagram of a hardware structure of a computer terminal (or mobile device) for implementing a target object detection method according to an embodiment of the present application;

fig. 2 is a flowchart of a target object detection method according to embodiment 1 of the present invention;

FIG. 3a is a schematic view of an interactive interface according to embodiment 1 of the present invention;

fig. 3b is a flowchart of an alternative target object detection method according to embodiment 1 of the present invention;

FIG. 4 is an apparent feature learning method for a deep learning network to learn a target according to embodiment 1 of the present invention;

FIG. 5 is an example of non-local model feature enhancement according to embodiment 1 of the present invention;

fig. 6 is a flowchart of a target object detection method according to embodiment 2 of the present invention;

fig. 7 is a flowchart of a target object detection method according to embodiment 3 of the present invention;

fig. 8 is a flowchart of a target object detection method according to embodiment 4 of the present invention;

fig. 9 is a flowchart of an image processing method according to embodiment 5 of the present invention;

fig. 10 is a flowchart of an image processing method according to embodiment 6 of the present invention;

FIG. 11 is a flowchart of a product inspection method according to embodiment 7 of the present invention;

fig. 12 is a schematic view of a target object detection apparatus according to embodiment 8 of the present invention;

fig. 13 is a schematic view of a target object detection apparatus according to embodiment 9 of the present invention;

fig. 14 is a schematic view of a target object detection apparatus according to embodiment 10 of the present invention;

fig. 15 is a schematic view of a target object detection apparatus according to embodiment 11 of the present invention;

fig. 16 is a schematic diagram of an image processing apparatus according to embodiment 12 of the present invention;

fig. 17 is a schematic diagram of an image processing apparatus according to embodiment 13 of the present invention;

FIG. 18 is a schematic view of a product inspection device according to embodiment 14 of the present invention;

fig. 19 is a block diagram of a computer terminal according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

First, some terms or terms appearing in the description of the embodiments of the present application are applicable to the following explanations:

image characteristics: the image features mainly include color features, texture features, shape features and spatial relationship features of the image.

At present, for the commodities existing in a scene, structured analysis such as positioning and classification is required, a deep learning network can be adopted to learn the apparent features of a target, the input end of the deep learning network is an image of the commodity, after the processing of the deep convolution network, the feature expression of the target object can be output, and the judgment is carried out by comparing the similarity between different objects. Other modeling associated models can also be adopted, the models are input into a group of images, and feature enhancement is carried out by calculating the correlation among all pixel points.

However, the above scheme adopts global association, which is very high in overhead for video memory and computation. In order to solve the above problems, the present application provides the following solutions.

Example 1

There is also provided, in accordance with an embodiment of the present application, an embodiment of a method for detecting a target object, where it is noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than here.

The method provided by the embodiment of the application can be executed in a mobile terminal, a computer terminal or a similar operation device. Fig. 1 shows a hardware configuration block diagram of a computer terminal (or mobile device) for implementing a target object detection method. As shown in fig. 1, the computer terminal 10 (or mobile device 10) may include one or more (shown as 102a, 102b, … …, 102 n) processors 102 (the processors 102 may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA, etc.), a memory 104 for storing data, and a transmission device 106 for communication functions. Besides, the method can also comprise the following steps: a display, an input/output interface (I/O interface), a Universal Serial BUS (USB) port (which may be included as one of the ports of the BUS), a network interface, a power source, and/or a camera. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

It should be noted that the one or more processors 102 and/or other data processing circuitry described above may be referred to generally herein as "data processing circuitry". The data processing circuitry may be embodied in whole or in part in software, hardware, firmware, or any combination thereof. Further, the data processing circuit may be a single stand-alone processing module or incorporated, in whole or in part, into any of the other components in the computer terminal 10 (or mobile device). As referred to in the embodiments of the application, the data processing circuit acts as a processor control (e.g. selection of a variable resistance termination path connected to the interface).

The memory 104 may be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the target object detection method in the embodiment of the present application, and the processor 102 executes various functional applications and data processing by running the software programs and modules stored in the memory 104, so as to implement the above-mentioned target object detection method. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the computer terminal 10 over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station so as to communicate with the internet. In one example, the transmission device 106 can be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.

The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computer terminal 10 (or mobile device).

It should be noted that in some alternative embodiments, the computer device (or mobile device) shown in fig. 1 may include hardware components (including circuitry), software components (including computer code stored on a computer-readable medium), or a combination of both hardware and software components. It should be noted that fig. 1 is only one example of a particular specific example and is intended to illustrate the types of components that may be present in the computer device (or mobile device) described above.

In the operating environment shown in fig. 1, the present application provides a method for detecting a target object as shown in fig. 2. Fig. 2 is a flowchart of a target object detection method according to an embodiment of the present invention. As shown in fig. 2, the method may include the steps of:

in step S202, a current image and a reference image are acquired.

The current image and the reference image in the above steps may both include the target object.

The current image in the above steps may be an image obtained by shooting a target object by a shooting device (for example, a camera, a video camera, etc.), may be an image selected from a storage space (for example, a memory, a hard disk, etc.), may be an image currently displayed on a display interface, and may be an image obtained from another device.

The reference image in the above step may include at least one of: current image, historical image. The historical image may be an image whose acquisition time is within a preset time period before the current image, and the acquisition time of the historical image may be limited by setting the preset time period, so that the target object in the obtained reference image is the same as the target object in the current image. The historical image may also be an image of any time before the current time, that is, when there is no image in the preset time period, the image may be acquired in other time periods.

The target object in the above steps may be a person, an object, or the like to be detected. For example, in a live e-commerce scenario, the target object may be a good recommended by the anchor; the target object in the above steps may also be a blackboard book of a speaker teacher in live video teaching, identity information of a specific character in a video conference, an english word in a reading scene, and the like.

The current image and the reference image in this embodiment may be pictures, or may be captured images or image frames in a video, and are not limited in this respect.

In an alternative embodiment, an image frame containing a target object may be acquired from a video being live as a current image, and the image frame, and an image frame within a preset time period between the image frames may be taken as a reference image. The preset time period can be set based on the requirement of the user.

Illustratively, the target object may be a product to be sold in a live broadcast process of a merchant, when the merchant sells the product in a live broadcast video, a period of time is generally introduced to one product, one or more live broadcast images can be acquired within the first three minutes of introducing the commodity, at this time, the acquired live broadcast image may be used as a reference image, and when the introduction time reaches three minutes, the live broadcast image can be acquired again, and the acquired live broadcast image at this time is used as a current image.

It should be noted that when a plurality of live broadcast images are acquired, the live broadcast images can be screened through a preset rule, and a clear live broadcast image is selected as a reference image, so that the accuracy of target object detection is improved.

In another optional embodiment, the method for detecting the externally provided target object may be invoked through a cloud, and a video to be detected may be acquired first, and the video to be detected may be transmitted to a corresponding processing device for processing, for example, directly transmitted to a computer terminal (e.g., a notebook computer, a personal computer, etc.) of a user for processing, or transmitted to a cloud server for processing through the computer terminal of the user. It should be noted that, since processing of the video to be detected needs a large amount of computing resources, in the embodiment of the present application, the processing device is taken as a cloud server for description.

For example, in order to facilitate the user to upload the video to be detected, an interactive interface may be provided for the user, as shown in fig. 3a, the user may select a video to be detected at a time from a large number of stored videos by using a "select video" button, or select a plurality of videos in batch, and upload the video selected by the user to the cloud server for processing by clicking the "upload" button. In addition, in order to facilitate the user to confirm whether the selected video is the video to be detected, the video selected by the user can be displayed in the video display area, and after the user confirms that the video is correct, data uploading is carried out by clicking the upload button.

Further, the cloud server may acquire a current image and a reference image of the target object in the video from the uploaded video, detect the current image and the reference image, and acquire a detection result of the target object.

And step S204, fusing the image information of the current image and the image information of the reference image to obtain fused image information.

In an optional embodiment, the image information in the current image and the image information related to the target object in the reference image may be fused to obtain fused image information, so as to improve the information of the target object in the current image, so as to improve the accuracy of detecting the target object in the current image.

In another alternative embodiment, the image information related to the target object in the reference image and the current image may be determined according to the similarity between the current image and the reference image, and the two image information may be fused to obtain the fused image information.

It should be noted that the image information of the current image and the image information of the reference image may be fused in the cloud server to obtain fused image information.

And step S206, detecting based on the fused image information to obtain a detection result of the target object in the current image.

The detection result of the target object in the above step may be a display position of the target object in the current image. For example, when the target object is a commodity, the detection result of the target object may be a position of the commodity in the live video.

For example, when a merchant sells a commodity in live video, the display position of the commodity can be obtained based on the second image feature of the commodity, the commodity image is identified, information such as the name and the type of the commodity is determined, and the link of the commodity can be further obtained by querying the database, so that the merchant can quickly push the link of the commodity to a live video page, and the link of the commodity can be automatically pushed to the live video page, so that the commodity selling efficiency is improved.

In an optional embodiment, the fused image information can be detected through a target detection model, so that a detection result of a target object in a current image is obtained; the target detection model can be a multilayer perceptron, and can also be a neural network model.

In another optional embodiment, by taking live teaching as an example for explanation, a video of the live teaching may be uploaded to a cloud server. In order to more accurately identify the blackboard writing content of a teacher in a scene of live teaching, the cloud server can take the blackboard writing of the teacher as a target object, when the teacher makes a new blackboard writing, a current image and a reference image of the blackboard writing can be obtained, image information related to the blackboard writing in the current image and the reference image is fused to obtain fused image information, detection is carried out based on the fused image information to obtain a detection result of the blackboard writing in the current image, and the accuracy of blackboard writing detection is improved.

Further, the cloud server can return the detection result to a display interface of live broadcast teaching, and specifically, the specific content of the blackboard writing can be displayed on the display interface in a bullet screen or text box mode; the content of the blackboard writing can be played in a voice broadcasting mode.

In another optional embodiment, taking a video conference scene as an example for description, video data of a video conference may be uploaded to a cloud server. In order to identify identity information of a specific character in a video conference scene, wherein the specific character can be a host, a leader, a specific guest or an important character of the video conference, the cloud server can use the specific character as a target object, when the specific character appears in the video conference scene, a current image and a reference image corresponding to the specific character can be obtained, image information related to the specific character in the current image and the reference image is fused to obtain fused image information, detection is carried out based on the fused image information to obtain a detection result of the specific character in the current image, and the accuracy of detection of the specific character is improved.

Further, the cloud server may return the detection result to a display interface of the video conference, specifically, a text box may be set at a position where the specific person is located on the display interface, and the detection result is displayed in the text box, or the detection result of the specific person is displayed on the display interface in a pop-up manner.

In yet another alternative embodiment, taking a reading scene as an example for description, the pictures shot in the reading scene may be uploaded to the cloud server. When a user needs to identify English words in a book, the English words can be used as target objects, a shooting device is used for shooting a current page with the English words to obtain a current image, a reference image of the English words is obtained, the current image and the reference image are uploaded to a cloud server, image information related to the English words in the current image and the reference image is fused in the cloud server to obtain fused image information, detection is carried out based on the fused image information to obtain a detection result of the English words in the current image, and the accuracy of English word detection is improved.

Furthermore, the cloud server can return the detection result to a specified display interface, specifically, the Chinese paraphrase of the English word can be displayed on the display interface, and the English word and the Chinese paraphrase corresponding to the English word are played in a voice broadcasting mode, so that a user can quickly obtain the Chinese paraphrase and English pronunciation of the English word, and the reading and learning efficiency of the user can be improved.

According to the scheme provided by the embodiment of the application, after the current image and the reference image are obtained, the image information of the current image and the image information of the reference image can be fused to obtain fused image information; and detecting based on the fused image information to obtain a detection result of the target object in the current image. Compared with the prior art, the image information of the current image and the reference image is fused, and the related information of the target object in the current image can be perfected by utilizing the image information of the reference image, so that the accuracy of the detection result of the target object in the current image can be improved by detecting the fused image information, the lower accuracy of the detection result of the target object caused by incomplete information of the target object in the current image is avoided, and the technical problem of lower accuracy of the detection of the target object in the related art is solved.

In the embodiment of the application, the reference image comprises a target object, wherein the step of fusing the image information of the current image and the image information of the reference image to obtain fused image information comprises the steps of extracting the characteristics of the current image and the reference image to obtain a first image characteristic of the current image and a reference image characteristic of the reference image; obtaining a second image characteristic of the current image based on the characteristic of a first pixel in the first image characteristic and the characteristic of a second pixel in the reference image characteristic, wherein the first pixel is any one pixel in the first image characteristic, and the second pixel is a pixel in a target area corresponding to the first pixel in the reference image characteristic; and determining the second image characteristic as fused image information.

The image features in the above steps may be self features that can be distinguished from other types of images, such as brightness, edges, texture and color, spatial relationship, etc.

In an alternative embodiment, a neural network model may be trained in advance as a backbone network for feature extraction, for example, a convolutional neural network may be used for feature extraction. And respectively inputting the current image and the reference image into the convolutional neural network, wherein corresponding outputs are the first image characteristic and the reference image characteristic respectively.

It should be noted that the specific type and network structure of the backbone network can be implemented by using the existing scheme, and the present application is not limited to this.

The pixel in the above step refers to a minimum unit in an image represented by one number sequence. For example, if the resolution of a picture is 72ppi, it means that the picture contains 72 pixels, and if the resolution of a picture is 300ppi, it means that the picture contains 300 pixels.

The central pixel of the target area in the above steps may be the pixel determined by mapping the first pixel to the reference image, the size of the target area may be set by the user according to the actual detection and recognition accuracy requirements, and may be adjusted according to the size of the target object in the image. For example, when the target object occupies one third of the image, the target area may be one third of the size of the image.

The target region may be adjusted according to the shape of the target object, the area of the target region may be the same as the area of the target object, the area of the target region may be larger than the area of the target object, and the shape of the target region may be the same as the shape of the target object.

For example, when the target object is a square, the target area may be a square that is twice as large as the square, so as to ensure that the target object is completely located in the target area; the shape of the target region may also be a square having the same area as the square, which can improve the accuracy of target object recognition.

In an optional embodiment, for each pixel in the first image feature, firstly, a pixel mapped in the reference image feature may be determined, and then, a target region centered on each mapped pixel may be determined, local association information between the current image and the reference image may be obtained by calculating the similarity between the feature of each pixel and the features of all pixels in the corresponding target region, and then, in combination with the first image feature, an enhanced feature of the current image, that is, the second image feature described above may be obtained.

In another alternative embodiment, after obtaining the second image feature, the second image feature may be input to a cascade detection branch and a recognition branch trained in advance for detection and recognition, so as to obtain a detection result and a recognition result of the target object, where the recognition result may be name information, type information, and the like of the target object, but is not limited thereto.

The second pixels are pixels in the target area corresponding to the first pixels in the reference image feature, but not all pixels in the reference image feature, so that the feature calculation amount can be reduced, the feature calculation amount in the subsequent calculation process is reduced, and the problem that calculation amount is too large due to the fact that calculation is carried out by using all features in the image in the related technology is avoided, so that the technical problem that the overhead of a target object detection method in the related technology for video memory and calculation amount is large is solved, the technical effects of reducing the feature calculation amount, reducing the video memory overhead and improving the target object detection efficiency are achieved.

In the above embodiments of the present application, obtaining the second image feature of the current image based on the feature of the first pixel in the first image feature and the feature of the second pixel in the reference image feature includes: acquiring the similarity of the characteristics of the first pixel and the characteristics of the second pixel; coding the similarity to obtain the correlation characteristics; and acquiring the sum of the correlation characteristic and the first image characteristic to obtain a second image characteristic.

In an alternative embodiment, the similarity may be encoded using a Multilayer Perceptron (MLP), which is an artificial neural network of a forward structure having an input layer, an output layer and several hidden layers.

In another alternative embodiment, the similarity may be encoded by using other neural networks such as a self-encoding neural network (AutoEncoder).

In another optional embodiment, the feature of each pixel in the first image feature and the feature of each pixel in the target region may be calculated to obtain the similarity between each pixel point in the first image feature and the target region, then the similarity is encoded by using a multi-layer perceptron to obtain a correlation feature, that is, local correlation information between target objects of different images is obtained, and the correlation feature and the original feature are further summed to obtain the second image feature, that is, the enhanced feature of the current image, so that the feature related to the target object in the first image feature may be enhanced, that is, the obtained second image feature may identify the target object more accurately, so as to improve the accuracy of the target object detection result.

In yet another alternative embodiment, the inner product of the feature of the first pixel and the feature of the second pixel may be obtained to obtain the similarity.

For example, a dot product operation may be performed on the feature of the first pixel and the feature of the second pixel, that is, a summation operation after multiplying the corresponding bits of the two features one by one, and the result may be a similarity between the feature of the first pixel and the feature of the second pixel. Specifically, the similarity C can be calculated by the following formula ^l ：

Wherein, F _q Characteristic of the first pixel described above, F _r As a feature of the second pixel, R is a distance from the edge of the target area to the center point, and T represents transposition.

It should be noted that the similarity between the feature of the first pixel and the feature of the second pixel may also be determined by: euclidean Distance (euclidean Distance) calculation formula, Manhattan Distance (Manhattan Distance) calculation formula, Minkowski Distance (Minkowski Distance) calculation formula, Cosine Similarity (Cosine Similarity) calculation formula, Jaccard Similarity (Jaccard Similarity) calculation formula, Pearson Correlation Coefficient (Pearson Correlation Coefficient), and the like. In the application, the similarity between the features is determined mainly by acquiring inner products between the features.

In the above embodiments of the present application, the reference image includes: in the case of multiple historical images, obtaining a second image feature of the current image based on the feature of a first pixel in the first image feature and the feature of a second pixel in the reference image feature, the method comprises the following steps: acquiring the similarity of the feature of the first pixel and the feature of the second pixel in the image feature of each historical image to obtain the corresponding similarity of each historical image; and obtaining a second image characteristic based on the corresponding similarity of each historical image and the characteristic of a second pixel in the image characteristic of each historical image.

In an alternative embodiment, in the case where there are multiple history images in the reference image, the feature of the first pixel may be obtained first, then determining the characteristic of a second pixel in the image characteristic of each historical image according to the characteristic of the first pixel, then determining the similarity between the feature of the first pixel and the feature of the second pixel in the image feature of each historical image according to the feature of the first pixel and the feature of the second pixel in the image feature of each historical image to obtain the corresponding similarity of each historical image, then obtaining the inner product of the similarity corresponding to each historical image and the feature of the second pixel in the image feature of each historical image, further obtaining the sum of all the inner products to obtain the second image feature, therefore, the historical information of the current image is obviously increased, and the detection precision of the occlusion and motion blur scene is improved.

In the above embodiment of the present application, obtaining a second image feature based on the similarity corresponding to each historical image and the feature of a second pixel in the image feature of each historical image includes: acquiring the ratio of the similarity corresponding to each historical image to a preset value to obtain the ratio corresponding to each historical image; obtaining a product of a ratio corresponding to each historical image and the feature of a second pixel in the image feature of each historical image to obtain a third image feature corresponding to each historical image; and acquiring the sum of the third image characteristics corresponding to the plurality of historical images to obtain a second image characteristic.

The preset value in the above steps can be set according to the requirement of the user, and can also be a preset value obtained through multiple target object detection experiments. In the embodiment of the present application, the preset value may be (2R +1) ² 。

In an optional embodiment, a ratio of the similarity corresponding to each history image to a preset value, that is, a similarity/preset value, may be obtained, and a third image feature corresponding to each history image may be obtained by performing point multiplication on the ratio corresponding to each history image and a feature of a second pixel in the image feature of each history image, and further, the third image features corresponding to each history image may be added to obtain the second image feature. Since the third image feature in the above step is obtained based on the similarity of each history image, the accuracy of the second image feature is higher. Specifically, the following formula can be adopted to calculate the second image feature

In the above embodiment of the present application, after performing detection based on the fused image information to obtain a detection result of the target object in the current image, the method further includes: outputting a detection result; and receiving first response data corresponding to the detection result, wherein the first response data is obtained by modifying the detection result.

In an optional embodiment, in order to further improve the accuracy of the target object detection result, the user may be allowed to modify the detection result after the cloud server outputs the detection result, so that a more accurate detection result may be output when the target object of the current image is detected next time; specifically, after the detection result is output and the user modifies the detection result, the modified detection result is used as first response data; after the first response data are obtained, the first response data can be sent to the cloud server, and the cloud server can adjust the target detection model by using the first response data, so that the accuracy of the target detection model is improved, and the accuracy of a target object detection result is improved.

In another optional embodiment, in order to more intelligently improve the accuracy of the detection result of the target object, the detection result of the target object may be automatically modified according to the feedback result of the user after the detection result is output by the cloud server, so as to obtain first response data; specifically, after the detection result is output, the user selects whether the detection result is correct according to the prompt box, and when the user selects to be correct, the cloud server does not need to receive first response data corresponding to the detection result; when the user selects incorrectly, the cloud server can detect the target object again, the new detection result is used as the first response data, and the detection result can be modified by using a preset modification rule to obtain the first response data.

In the above embodiment of the present application, after the image information of the current image and the image information of the reference image are fused to obtain the fused image information, the method further includes: outputting the fused image information; receiving second response data corresponding to the fused image information, wherein the second response data is obtained by modifying the fused image information; and detecting based on the second response data to obtain a detection result of the target object.

In an optional embodiment, in order to enable the fused image information to be more accurate, the fused image information can be sent to the user after the cloud server obtains the fused image information, and the user is allowed to modify the fused image information, so that the fused image information can be more accurate when the current image and the reference image are fused next time; specifically, when the fused image information is output, after the user modifies the fused image information, the modified image information is used as second response data, and the second response data is sent to the cloud server, so that the cloud server adjusts the feature fusion algorithm based on the second response data, the accuracy of the fused image information is improved, and the accuracy of target object detection is improved.

In another optional embodiment, in order to more intelligently improve the accuracy of the fused image information, the fused image information may be automatically modified according to a feedback result of the user after the fused image information is output by the cloud server, so as to obtain second response data; specifically, after outputting the fused image information, the user selects whether the fused image information is accurate according to the prompt box, and when the user selects the image information accurately, the cloud server does not need to receive second response data; when the user selection is inaccurate, the cloud server can fuse the current image and the reference image again, the fused new image information is used as second response data, the fused image information can be modified by using a preset modification rule to obtain the second response data, the second response data is sent to the cloud server, the cloud server adjusts the algorithm of feature fusion based on the second response data, the accuracy of the fused image information is improved, and the accuracy of target object detection is improved.

Referring to fig. 3b, a preferred embodiment of the present application will be described in detail, as shown in fig. 3b, the method may be executed by a front-end client or a back-end server, the left part of fig. 3b is a schematic diagram of the whole processing flow, and the right part is a detailed processing flow of the part shown by the left dashed box.

The method comprises the following steps of:

step S31, acquiring an input image, including: image 1, image 2, image 3, and image 3.

Alternatively, as shown in fig. 3b, image 1 may be a reference image for images 2 and 3, image 2 may be a reference image for images 3 and 4, and image 3 may be a reference image for image 4.

Step S32, inputting the input image into the backbone network, obtaining the first image characteristic F through the convolution neural network _q 。

In step S33, the current image I may be learned by the local association learning module _t First image characteristics of (i.e. each image input) and temporal reference list I _t-1 ，I _t-2 ，…，I _t-k And (namely, a reference object of each image) performing local association learning on the reference image features of each reference image to obtain second image features of each input image.

Alternatively, the association learning may be performed by comparing the similarity between each pixel in the first image feature of the current image and the corresponding local pixel in the reference image feature of the reference image. At the same time, the correlation can also be learned within the current image.

And step S34, detecting and identifying the second image characteristic through the cascade detection branch and the identification branch to obtain detection output and identification output.

Optionally, the target object may be continuously tracked based on the detection output and the recognition output, so as to obtain the track segment.

The detailed processing flow comprises the following steps:

in step S331, the input of the local association learning module is a first image feature F with a size h × w × d _q And the same size of reference image feature F _r . Taking the retrieved pixel in the first image feature as an example, the correlation learning is a corresponding small region between the feature of the retrieved pixel in the retrieved feature and the feature of the reference image (as shown by the line between the retrieved pixel and the time series reference in FIG. 3)Shown) of the same. The similarity of the small regions can be determined by the following formula:

wherein, R is the distance from the edge of the local area to the central point. A multi-layer perceptual machine is then used to encode the local correlation information and sum this correlation information with the first image feature to obtain the enhanced feature. The enhanced features may be obtained by the following formula

For historical information, the relevance of local similarity between images can also be modeled to convey features. Specifically, the method can be realized by the following formula:

in step S332, for each image, the related features within the image may be further calculated.

Optionally, the second image feature of each image may be obtained by performing associated learning by modeling local similarities between images and within the images.

In the related art, the method for learning the apparent characteristics of the target by adopting a deep learning network comprises the following steps: as shown in fig. 4, the image of the commodity, which is the input end, is processed by the deep convolutional network, and finally the feature expression of the target is output, and the similarity between different objects is compared to judge. Currently, there are also other models related to modeling, non-local models (non-local), as shown in fig. 5, and the method inputs a group of images and performs feature enhancement by calculating the correlation among all pixel points.

The current video target tracking algorithm mainly has the following three problems: 1) there is a lack of learning for associated information within an image. The existing algorithm mainly depends on the visual attribute characteristics of target individuals, and the relevance between target objects is not used for reasoning, so that the recognition accuracy of the algorithm in a partially-occluded scene is reduced. 2) History information cannot be utilized. The existing tracking algorithm generally adopts independent image frames for detection, and limits the detection recall rate of the algorithm in a motion blur scene and an occlusion scene. 3) The existing algorithm adopts global association, and is extremely high in expenses for video memory and calculated amount.

In the application, aiming at the problems in the related art, the historical information of the current frame can be obviously increased by the method, so that the detection precision of the algorithm on the occlusion and motion blur scenes is improved. Firstly, a local correlation module is designed, the similarity of each feature point and the feature points adjacent to the feature point is calculated densely, and the original image features are enhanced through a light-weight mapping. The module can be applied to a single characteristic image and is used for learning the associated expression of the target and the context, and the discrimination capability of the network is enhanced through the expression method. Through the module, the association between different frames is learned, and the historical frames are used for forming a feature pool to enhance the features of the current frame. Through local correlation learning (local correlation), the search range is reduced a priori by utilizing the motion of the target object in time sequence, and the calculation amount is greatly reduced. Meanwhile, due to the fact that the correlation learning is carried out in the images and among the images, the detection precision of the algorithm on the complex scene is improved.

It should be noted that for simplicity of description, the above-mentioned embodiments of the method are described as a series of acts, but those skilled in the art should understand that the present application is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present application or portions thereof that contribute to the prior art may be embodied in the form of a software product, where the computer software product is stored in a storage medium (such as a ROM/RAM, a magnetic disk, or an optical disk), and includes several instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, or a network device) to execute the method of the embodiments of the present application.

Example 2

There is also provided, in accordance with an embodiment of the present application, an embodiment of a method for detecting a target object, where it is noted that the steps illustrated in the flowchart of the drawings may be implemented in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than here.

Fig. 6 is a flowchart of a target object detection method according to an embodiment of the present invention. As shown in fig. 6, the method may include the steps of:

in step S602, a current image and a reference image are received.

The current image and the reference image in the above steps may be a picture, a moving picture, or an image frame in a video, and are not limited specifically here.

The target object in the above steps may be a person, an object, or the like to be detected.

It should be noted that the subject receiving the current picture and the reference picture may be a background server, such as a cloud server, but is not limited thereto. The main body for uploading the current picture and the reference picture may be a front-end client, that is, an application program installed on a mobile terminal such as a mobile phone and a tablet computer used by a user, or an application program installed on a computer terminal such as a notebook computer and a desktop computer, but is not limited thereto. The front-end client can communicate with the background server in the modes of Internet, WIFI, 3G, 4G or 5G and the like.

And step S604, fusing the image information of the current image and the image information of the reference image to obtain fused image information.

And step S606, detecting based on the fused image information to obtain a detection result of the target object in the current image.

In step S608, the detection result is output.

In an optional embodiment, the detection result may be output through a display interface, specifically, the detection result may be output through a bullet screen, or the detection result may be output through a picture link; the detection result can also be output in a voice mode and the like.

It should be noted that the preferred embodiments described in the above examples of the present application are the same as the schemes, application scenarios, and implementation procedures provided in example 1, but are not limited to the schemes provided in example 1.

Example 3

Fig. 7 is a flowchart of a target object detection method according to an embodiment of the present invention. As shown in fig. 7, the method may include the steps of:

in step S702, video data of the target object is acquired.

The video data in the above steps may be a plurality of consecutive image frames.

In an alternative embodiment, when a merchant sells goods through a live video, the target object may be the goods, and the video data may be video data corresponding to the goods, i.e., a plurality of consecutive image frames.

Step S704, intercepting the video data to obtain a current image and a reference image.

In an alternative embodiment, each image frame in the video data may be identified, and the image frame with the target object may be cut out to obtain the current image and the reference image. The current image may be a video frame in which the target object exists last in the consecutive image frames, and the reference image may be any one of the video frames.

Furthermore, the image frames can be preprocessed, and videos with high definition are screened out, so that the accuracy of detecting the target object is improved.

Step S706, fusing the image information of the current image and the image information of the reference image to obtain fused image information.

Step S708, performing detection based on the fused image information to obtain a detection result of the target object in the current image.

Example 4

Fig. 8 is a flowchart of a target object detection method according to an embodiment of the present invention. As shown in fig. 8, the method may include the steps of:

step S802, in the process of live video, displaying video data in a playing interface.

The video data includes a target object.

The playing interface in the above steps is the live interface of the live broadcast software. The video data displayed in the play interface may be a plurality of image frames in succession.

Step S804, fusing the image information of the current image and the image information of the reference image to obtain fused image information.

Step S806, marking the target object in the video playing interface.

Wherein, the target object is detected based on the fused image information to obtain

Example 5

There is also provided, in accordance with an embodiment of the present application, an image processing method embodiment, it should be noted that the steps illustrated in the flowchart of the accompanying drawings may be performed in a computer system such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than here.

Fig. 9 is a flowchart of an image processing method according to an embodiment of the present invention. As shown in fig. 9, the method may include the steps of:

step S902, a current image and a reference image are acquired.

Step S904, fusing the image information of the current image and the image information of the reference image to obtain fused image information.

The fused image information is used for determining a target object in the current image.

Example 6

There is also provided, in accordance with an embodiment of the present application, an image processing method embodiment, it should be noted that the steps illustrated in the flowchart of the accompanying drawings may be carried out in a computer system such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be carried out in an order different than here.

Fig. 10 is a flowchart of an image processing method according to an embodiment of the present invention. As shown in fig. 10, the method may include the steps of:

in step S1002, a current image and a reference image are received.

Wherein, the current image and the reference image both contain the target object.

Step S1004, fusing the image information of the current image and the image information of the reference image to obtain fused image information.

Step S1006, the fused image information is output.

Example 7

There is also provided, in accordance with an embodiment of the present application, an embodiment of a method for product inspection, it being noted that the steps illustrated in the flowchart of the drawings may be carried out in a computer system such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be carried out in an order different than that presented herein.

FIG. 11 is a flow chart of a method of product inspection according to an embodiment of the present invention. As shown in fig. 11, the method may include the steps of:

step S1102, a current image and a reference image of the product are acquired.

The product in the above steps may be a commodity being recommended by a main broadcasting in a live video.

And step S1104, fusing the image information of the current image and the image information of the reference image to obtain fused image information.

And step S1106, detecting based on the fused image information to obtain a detection result of the product.

The detection result of the product in the above steps may be information such as a purchase link, a name, a picture, a function, etc. of the product being recommended by the anchor.

In the above embodiment of the present application, fusing image information of a current image and image information of a reference image, and obtaining fused image information includes: performing feature extraction on the current image and the reference image to obtain a first image feature of the current image and a reference image feature of the reference image; obtaining a second image characteristic of the current image based on the characteristic of a first pixel in the first image characteristic and the characteristic of a second pixel in the reference image characteristic, wherein the first pixel is any one pixel in the first image characteristic, and the second pixel is a pixel in a target area corresponding to the first pixel in the reference image characteristic; and determining the second image characteristic as fused image information.

In the above embodiments of the present application, the reference image includes at least one of: current image, historical image.

It should be noted that the preferred embodiments described in the foregoing examples of the present application are the same as the schemes, application scenarios, and implementation procedures provided in example 1, but are not limited to the schemes provided in example 1.

Example 8

According to an embodiment of the present application, there is also provided a target object detection apparatus for implementing the target object detection method, as shown in fig. 12, the apparatus 1200 includes: an acquisition module 1202, a fusion module 1204, and a detection module 1208.

The acquisition module is used for acquiring a current image and a reference image; the fusion module is used for fusing the image information of the current image and the image information of the reference image to obtain fused image information; the detection module is used for detecting based on the fused image information to obtain a detection result of the target object in the current image.

It should be noted here that the acquiring module 1202, the fusing module 1204, and the detecting module 1208 correspond to steps S202 to S206 in embodiment 1, and the three modules are the same as the corresponding steps in the implementation example and application scenario, but are not limited to the disclosure in embodiment 1. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.

In the above embodiments of the present application, the fusion module includes: the device comprises an extracting unit, a generating unit and a determining unit.

The extraction unit is used for extracting the features of the current image and the reference image to obtain a first image feature of the current image and a reference image feature of the reference image; the generating unit is used for obtaining a second image characteristic of the current image based on the characteristic of a first pixel in the first image characteristic and the characteristic of a second pixel in the reference image characteristic, wherein the first pixel is any one pixel in the first image characteristic, and the second pixel is a pixel in a target area corresponding to the first pixel in the reference image characteristic; the determining unit is used for determining the second image characteristic as fused image information.

In the above embodiment of the present application, the generating unit includes: the device comprises a first acquisition subunit, a coding subunit and a second acquisition subunit.

The first acquiring subunit is used for acquiring the similarity of the characteristic of the first pixel and the characteristic of the second pixel; the coding subunit is used for coding the similarity to obtain the correlation characteristics; and the second acquisition subunit is used for acquiring the sum of the correlation characteristic and the first image characteristic to obtain a second image characteristic.

In the above embodiments of the present application, the reference image includes: in the case of a plurality of history images, the generation unit includes: a third acquisition subunit and a generation subunit.

The third obtaining subunit is configured to obtain similarity between the feature of the first pixel and a feature of a second pixel in the image features of each historical image, and obtain similarity corresponding to each historical image; the generating subunit is configured to obtain a second image feature based on the similarity corresponding to each historical image and a feature of a second pixel in the image feature of each historical image.

In the above embodiment of the present application, the third obtaining subunit is further configured to obtain a ratio of the similarity corresponding to each historical image to a preset value, so as to obtain a ratio corresponding to each historical image; the third subunit is further configured to obtain a product of the ratio corresponding to each historical image and the feature of the second pixel in the image feature of each historical image, and obtain a third image feature corresponding to each historical image; the third subunit is further configured to obtain a sum of third image features corresponding to the plurality of history images, so as to obtain a second image feature.

In the above embodiment of the present application, the apparatus further includes: the device comprises a first output module and a first receiving module.

The first output module is used for outputting a detection result; the receiving module is used for receiving first response data corresponding to the detection result, wherein the first response data is obtained by modifying the detection result.

In the above embodiment of the present application, the apparatus further includes: the device comprises a second output module, a second receiving module and a generating module.

The second output module is used for outputting the fused image information; the second receiving module is used for receiving second response data corresponding to the fused image information, wherein the second response data is obtained by modifying the fused image information; the generating module is used for detecting based on the second response data to obtain a detection result of the target object.

Example 9

According to an embodiment of the present application, there is also provided an apparatus for detecting a target object, which is used for implementing the method for detecting a target object, as shown in fig. 13, the apparatus 1300 includes: a receiving module 1302, a fusing module 1304, a detecting module 1306, and an outputting module 1308.

The receiving module is used for receiving a current image and a reference image; the fusion module is used for fusing the image information of the current image and the image information of the reference image to obtain fused image information; the detection module is used for detecting based on the fused image information to obtain a detection result of the target object in the current image; the output module is used for outputting the detection result.

It should be noted here that the receiving module 1302, the fusing module 1304, the detecting module 1306, and the outputting module 1308 correspond to steps S602 to S608 in embodiment 2, and the four modules are the same as the corresponding steps in the implementation example and the application scenario, but are not limited to the disclosure in embodiment 1. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.

Example 10

According to an embodiment of the present application, there is also provided an apparatus for detecting a target object, where the apparatus 1400 is configured to implement the method for detecting a target object, as shown in fig. 14, and includes: an acquisition module 1402, a truncation module 1404, a fusion module 1406, and a detection module 1408.

The acquisition module is used for acquiring video data of a target object; the intercepting module is used for intercepting the video data to obtain a current image and a reference image; the fusion module is used for fusing the image information of the current image and the image information of the reference image to obtain fused image information; the detection module is used for detecting based on the fused image information to obtain a detection result of the target object in the current image.

It should be noted here that the acquiring module 1402, the intercepting module 1404, the fusing module 1406, and the detecting module 1408 correspond to steps S702 to S708 in embodiment 3, and the four modules are the same as the corresponding steps in the implementation example and the application scenario, but are not limited to the disclosure in embodiment 1. It should be noted that the above modules as a part of the apparatus may operate in the computer terminal 10 provided in embodiment 1.

Example 11

According to an embodiment of the present application, there is also provided an apparatus for detecting a target object, where the apparatus 1500 is configured to implement the method for detecting a target object, as shown in fig. 15, and includes: a display module 1502, a fusion module 1504, and a labeling module 1506.

The display module is used for displaying video data in a playing interface in the process of live video, wherein the video data comprises a target object; the fusion module is used for fusing the image information of the current image and the image information of the reference image to obtain fused image information; the marking module is used for marking a target object in the video playing interface, wherein the target object is obtained by detecting based on the fused image information.

It should be noted here that the display module 1502, the fusion module 1504, and the marking module 1506 correspond to steps S802 to S806 in embodiment 4, and the three modules are the same as the corresponding steps in the implementation example and application scenario, but are not limited to the disclosure in embodiment 1. It should be noted that the above modules as a part of the apparatus may operate in the computer terminal 10 provided in embodiment 1.

Example 12

According to an embodiment of the present application, there is also provided an apparatus for implementing the above-described image processing, as shown in fig. 16, the apparatus 1600 including: an obtaining module 1602 and a fusing module 1604.

The acquisition module is used for acquiring a current image and a reference image; the fusion module is used for fusing the image information of the current image with the image information of the reference image to obtain fused image information, wherein the fused image information is used for determining a target object in the current image.

It should be noted here that the acquiring module 1602 and the fusing module 1604 correspond to steps S902 to S904 in embodiment 5, and the two modules are the same as the examples and application scenarios realized by the corresponding steps, but are not limited to the disclosure in embodiment 1. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.

Example 13

According to an embodiment of the present application, there is also provided an apparatus for implementing the above-described image processing, as shown in fig. 17, the apparatus 1700 includes: a receiving module 1702, a fusing module 1704, and an output module 1706.

The receiving module is used for receiving a current image and a reference image, wherein the current image and the reference image both comprise a target object; the fusion module is used for fusing the image information of the current image with the image information of the reference image to obtain fused image information, wherein the fused image information is used for determining a target object in the current image; the output module is used for outputting the fused image information.

It should be noted here that the receiving module 1702, the fusing module 1704, and the outputting module 1706 correspond to steps S1002 to S1006 in embodiment 6, and the four modules are the same as the corresponding steps in the implementation example and the application scenario, but are not limited to the disclosure in embodiment 1. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.

Example 14

According to an embodiment of the present application, there is also provided an apparatus for implementing the above product inspection, as shown in fig. 18, the apparatus 1800 includes: an obtaining module 1802, a fusing module 1804, and a detecting module 1806.

The acquisition module is used for acquiring a current image and a reference image of a product; the fusion module is used for fusing the image information of the current image and the image information of the reference image to obtain fused image information; the detection module is used for detecting based on the fused image information to obtain a detection result of the product.

It should be noted here that the obtaining module 1802, the fusing module 1804, and the detecting module 1806 correspond to steps S1102 to S1106 in embodiment 7, and the three modules are the same as the corresponding steps in the implementation example and application scenario, but are not limited to the disclosure in embodiment 1. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.

In the above embodiments of the present application, the fusion module includes: the device comprises an extraction unit, a fusion unit and a determination unit.

The extraction unit is used for extracting the characteristics of the current image and the reference image to obtain a first image characteristic of the current image and a reference image characteristic of the reference image; the fusion unit is used for obtaining a second image characteristic of the current image based on the characteristic of a first pixel in the first image characteristic and the characteristic of a second pixel in the reference image characteristic, wherein the first pixel is any one pixel in the first image characteristic, and the second pixel is a pixel in a target area corresponding to the first pixel in the reference image characteristic; the determining unit is used for determining the second image characteristic as fused image information.

The reference image in the above embodiment of the present application includes at least one of: current image, historical image.

Example 15

Embodiments of the present invention also provide a computer-readable storage medium. Alternatively, in this embodiment, the storage medium may be configured to store the program code executed by the target object detection method, the image processing method, or the product detection method provided in the first embodiment.

Optionally, in this embodiment, the computer-readable storage medium may be located in any one of a group of computer terminals in a computer network, or in any one of a group of mobile terminals.

Optionally, in this embodiment, a computer-readable storage medium is configured to store program code for performing the steps of: acquiring a current image and a reference image; fusing image information of a current image and image information of a reference image to obtain fused image information; and detecting based on the fused image information to obtain a detection result of the target object in the current image.

Optionally, the storage medium is further arranged to store program code for performing the steps of: performing feature extraction on a current image and a reference image to obtain a first image feature of the current image and a reference image feature of the reference image; obtaining a second image characteristic of the current image based on the characteristic of a first pixel in the first image characteristic and the characteristic of a second pixel in the reference image characteristic, wherein the first pixel is any one pixel in the first image characteristic, and the second pixel is a pixel in a target area corresponding to the first pixel in the reference image characteristic; and determining the second image characteristic as fused image information.

Optionally, the storage medium is further arranged to store program code for performing the steps of: acquiring the similarity of the characteristic of the first pixel and the characteristic of the second pixel; coding the similarity to obtain the correlation characteristics; and acquiring the sum of the correlation characteristic and the first image characteristic to obtain a second image characteristic.

Optionally, the storage medium is further arranged to store program code for performing the steps of: the reference image includes at least one of: current image, historical image.

Optionally, the storage medium is further arranged to store program code for performing the steps of: the reference image includes: under the condition of a plurality of historical images, acquiring the similarity of the feature of a first pixel and the feature of a second pixel in the image feature of each historical image to obtain the corresponding similarity of each historical image; and obtaining a second image characteristic based on the corresponding similarity of each historical image and the characteristic of a second pixel in the image characteristic of each historical image.

Optionally, the storage medium is further arranged to store program code for performing the steps of: acquiring the ratio of the similarity corresponding to each historical image to a preset value to obtain the ratio corresponding to each historical image; obtaining a product of a ratio corresponding to each historical image and the feature of a second pixel in the image feature of each historical image to obtain a third image feature corresponding to each historical image; and acquiring the sum of the third image characteristics corresponding to the plurality of historical images to obtain a second image characteristic.

Optionally, the storage medium is further arranged to store program code for performing the steps of: outputting a detection result; and receiving first response data corresponding to the detection result, wherein the first response data is obtained by modifying the detection result.

Optionally, the storage medium is further arranged to store program code for performing the steps of: outputting the fused image information; receiving second response data corresponding to the fused image information, wherein the second response data is obtained by modifying the fused image information; and detecting based on the second response data to obtain a detection result of the target object.

As an alternative example, the storage medium is further arranged to store program code for performing the steps of: receiving a current image and a reference image; fusing image information of a current image and image information of a reference image to obtain fused image information; detecting based on the fused image information to obtain a detection result of the target object in the current image; and outputting a detection result.

As an alternative example, the storage medium is further arranged to store program code for performing the steps of: acquiring video data of a target object; intercepting video data to obtain a current image and a reference image; fusing image information of a current image and image information of a reference image to obtain fused image information; and detecting based on the fused image information to obtain a detection result of the target object in the current image.

As an alternative example, the storage medium is further arranged to store program code for performing the steps of: in the video live broadcast process, displaying video data in a playing interface, wherein the video data comprises a target object; fusing image information of a current image and image information of a reference image to obtain fused image information; and marking a target object in the video playing interface, wherein the target object is obtained by detecting based on the fused image information.

As an alternative example, the storage medium is further arranged to store program code for performing the steps of: acquiring a current image and a reference image; and fusing the image information of the current image and the image information of the reference image to obtain fused image information, wherein the fused image information is used for determining the target object in the current image.

As an alternative example, the storage medium is further arranged to store program code for performing the steps of: receiving a current image and a reference image, wherein the current image and the reference image both comprise a target object; fusing image information of a current image and image information of a reference image to obtain fused image information, wherein the fused image information is used for determining a target object in the current image; and outputting the fused image information.

As an alternative example, the storage medium is further arranged to store program code for performing the steps of: acquiring a current image and a reference image of a product; fusing image information of a current image and image information of a reference image to obtain fused image information; and detecting based on the fused image information to obtain a detection result of the product.

Optionally, the storage medium is further arranged to store program code for performing the steps of: performing feature extraction on the current image and the reference image to obtain a first image feature of the current image and a reference image feature of the reference image; obtaining a second image characteristic of the current image based on the characteristic of a first pixel in the first image characteristic and the characteristic of a second pixel in the reference image characteristic, wherein the first pixel is any one pixel in the first image characteristic, and the second pixel is a pixel in a target area corresponding to the first pixel in the reference image characteristic; and determining the second image characteristic as fused image information.

Example 16

An embodiment of the present invention may provide a computer terminal, where the computer terminal may be disposed in the object detection system according to the embodiment of the present invention, and the computer terminal may be any computer terminal device in a computer terminal group. Optionally, in this embodiment, the computer terminal may also be replaced with a terminal device such as a mobile terminal.

Optionally, in this embodiment, the computer terminal may be located in at least one network device of a plurality of network devices of a computer network.

Alternatively, fig. 19 is a block diagram of a computer terminal according to an embodiment of the present application. As shown in fig. 19, the computer terminal a may include: one or more processors 1902 (only one of which is shown), and a memory 1904.

The memory may be configured to store software programs and modules, such as program instructions/modules corresponding to the target object detection method and apparatus in the embodiments of the present application, and the processor executes various functional applications and data processing by running the software programs and modules stored in the memory, so as to implement the target object detection method, the image processing method, and the product detection method. The memory may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory remotely located from the processor, which may be connected to terminal a through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

In this embodiment, the computer terminal may execute the program code of the following steps in the target object detection method: acquiring a current image and a reference image; fusing image information of a current image and image information of a reference image to obtain fused image information; and detecting based on the fused image information to obtain a detection result of the target object in the current image.

Optionally, the processor may further execute the program code of the following steps: performing feature extraction on a current image and a reference image to obtain a first image feature of the current image and a reference image feature of the reference image; obtaining a second image characteristic of the current image based on the characteristic of a first pixel in the first image characteristic and the characteristic of a second pixel in the reference image characteristic, wherein the first pixel is any one pixel in the first image characteristic, and the second pixel is a pixel in a target area corresponding to the first pixel in the reference image characteristic; and determining the second image characteristic as fused image information.

Optionally, the processor may further execute the program code of the following steps: acquiring the similarity of the characteristic of the first pixel and the characteristic of the second pixel; coding the similarity to obtain an associated characteristic; and acquiring the sum of the correlation characteristic and the first image characteristic to obtain a second image characteristic.

Optionally, the processor may further execute the program code of the following steps: the reference image includes at least one of: current image, historical image.

Optionally, the processor may further execute the program code of the following steps: the reference image includes: under the condition of a plurality of historical images, obtaining the similarity of the characteristic of a first pixel and the characteristic of a second pixel in the image characteristic of each historical image to obtain the corresponding similarity of each historical image; and obtaining a second image characteristic based on the corresponding similarity of each historical image and the characteristic of a second pixel in the image characteristic of each historical image.

Optionally, the processor may further execute the program code of the following steps: obtaining the ratio of the similarity corresponding to each historical image to a preset value to obtain the ratio corresponding to each historical image; obtaining the product of the ratio corresponding to each historical image and the characteristic of a second pixel in the image characteristic of each historical image to obtain a third image characteristic corresponding to each historical image; and acquiring the sum of the third image characteristics corresponding to the plurality of historical images to obtain a second image characteristic.

Optionally, the processor may further execute the program code of the following steps: outputting a detection result; and receiving first response data corresponding to the detection result, wherein the first response data is obtained by modifying the detection result.

Optionally, the processor may further execute the program code of the following steps: outputting the fused image information; receiving second response data corresponding to the fused image information, wherein the second response data is obtained by modifying the fused image information; and detecting based on the second response data to obtain a detection result of the target object.

As an alternative example, the processor may invoke the information stored in the memory and the application program via the transmission means to perform the following steps: receiving a current image and a reference image; fusing image information of a current image and image information of a reference image to obtain fused image information; detecting based on the fused image information to obtain a detection result of the target object in the current image; and outputting a detection result.

As an alternative example, the processor may invoke the information stored in the memory and the application program via the transmission means to perform the following steps: acquiring video data of a target object; intercepting video data to obtain a current image and a reference image; fusing image information of a current image and image information of a reference image to obtain fused image information; and detecting based on the fused image information to obtain a detection result of the target object in the current image.

As an alternative example, the processor may invoke the information stored in the memory and the application program via the transmission means to perform the following steps: in the video live broadcast process, displaying video data in a playing interface, wherein the video data comprises a target object; fusing image information of a current image and image information of a reference image to obtain fused image information; and marking a target object in the video playing interface, wherein the target object is obtained by detecting based on the fused image information.

As an alternative example, the processor may invoke the information stored in the memory and the application program via the transmission means to perform the following steps: acquiring a current image and a reference image; and fusing the image information of the current image and the image information of the reference image to obtain fused image information, wherein the fused image information is used for determining a target object in the current image.

As an alternative example, the processor may invoke the information stored in the memory and the application program via the transmission means to perform the following steps: receiving a current image and a reference image, wherein the current image and the reference image both comprise a target object; fusing image information of a current image and image information of a reference image to obtain fused image information, wherein the fused image information is used for determining a target object in the current image; and outputting the fused image information.

As an alternative example, the processor may invoke the information stored in the memory and the application program via the transmission means to perform the following steps: acquiring a current image and a reference image of a product; fusing image information of a current image and image information of a reference image to obtain fused image information; and detecting based on the fused image information to obtain a detection result of the product.

Optionally, the processor may further execute the program code of the following steps: performing feature extraction on the current image and the reference image to obtain a first image feature of the current image and a reference image feature of the reference image; obtaining a second image characteristic of the current image based on the characteristic of a first pixel in the first image characteristic and the characteristic of a second pixel in the reference image characteristic, wherein the first pixel is any one pixel in the first image characteristic, and the second pixel is a pixel in a target area corresponding to the first pixel in the reference image characteristic; and determining the second image characteristic as fused image information.

It can be understood by those skilled in the art that the structure shown in fig. 19 is only an illustration, and the computer terminal a may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 19 is a diagram illustrating a structure of the mobile terminal. For example, the computer terminal a may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in fig. 19, or have a different configuration than shown in fig. 19.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

Example 17

According to an embodiment of the present application, there is also provided a target object detection system, including:

a processor; and

a memory coupled to the processor for providing instructions to the processor for processing the following processing steps: acquiring a current image and a reference image; fusing image information of a current image and image information of a reference image to obtain fused image information; and detecting based on the fused image information to obtain a detection result of the target object in the current image.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

In the embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to the related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be an indirect coupling or communication connection through some interfaces, units or modules, and may be electrical or in other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application, or portions or all or portions of the technical solutions that contribute to the prior art, may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, which can store program codes.

The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims

1. A method of detecting a target object, comprising:

acquiring a current image and a reference image;

fusing the image information of the current image and the image information of the reference image to obtain fused image information;

and detecting based on the fused image information to obtain a detection result of the target object in the current image.

2. The method according to claim 1, wherein the reference image includes the target object, and wherein fusing the image information of the current image and the image information of the reference image to obtain fused image information comprises:

performing feature extraction on the current image and the reference image to obtain a first image feature of the current image and a reference image feature of the reference image;

obtaining a second image characteristic of the current image based on the characteristic of a first pixel in the first image characteristic and the characteristic of a second pixel in the reference image characteristic, wherein the first pixel is any one pixel in the first image characteristic, and the second pixel is a pixel in a target area corresponding to the first pixel in the reference image characteristic;

and determining the second image characteristic as the fused image information.

3. The method of claim 2, wherein obtaining the second image feature of the current image based on the feature of the first pixel in the first image feature and the feature of the second pixel in the reference image feature comprises:

acquiring the similarity of the characteristics of the first pixel and the characteristics of the second pixel;

coding the similarity to obtain an associated characteristic;

and acquiring the sum of the associated feature and the first image feature to obtain the second image feature.

4. The method of claim 2, wherein the reference image comprises at least one of: the current image and the historical image.

5. The method of claim 4, wherein the reference image comprises: under the condition of a plurality of historical images, obtaining a second image characteristic of the current image based on the characteristic of a first pixel in the first image characteristic and the characteristic of a second pixel in the reference image characteristic, wherein the method comprises the following steps:

obtaining the similarity of the feature of the first pixel and the feature of the second pixel in the image feature of each historical image to obtain the similarity corresponding to each historical image;

and obtaining the second image characteristic based on the corresponding similarity of each historical image and the characteristic of the second pixel in the image characteristic of each historical image.

6. The method according to claim 5, wherein obtaining the second image feature based on the corresponding similarity of each historical image and the feature of the second pixel in the image feature of each historical image comprises:

acquiring the ratio of the similarity corresponding to each historical image to a preset value to obtain the ratio corresponding to each historical image;

obtaining a product of a ratio corresponding to each historical image and the feature of the second pixel in the image feature of each historical image to obtain a third image feature corresponding to each historical image;

and acquiring the sum of third image characteristics corresponding to the plurality of historical images to obtain the second image characteristic.

7. The method according to claim 1, wherein after the detection based on the fused image information, obtaining a detection result of the target object in the current image, the method further comprises:

outputting the detection result;

and receiving first response data corresponding to the detection result, wherein the first response data is obtained by modifying the detection result.

8. The method according to claim 1, wherein after fusing the image information of the current image with the image information of the reference image to obtain fused image information, the method further comprises:

outputting the fused image information;

receiving second response data corresponding to the fused image information, wherein the second response data is obtained by modifying the fused image information;

and detecting based on the second response data to obtain a detection result of the target object.

9. A method of detecting a target object, comprising:

receiving a current image and a reference image;

detecting based on the fused image information to obtain a detection result of the target object in the current image;

and outputting the detection result.

10. A method of detecting a target object, comprising:

acquiring video data of a target object;

intercepting the video data to obtain a current image and a reference image;

11. A method of detecting a target object, comprising:

in the video live broadcasting process, displaying video data in a playing interface, wherein the video data comprises a target object;

fusing image information of a current image and image information of a reference image to obtain fused image information;

and marking the target object in a video playing interface, wherein the target object is obtained by detecting based on the fused image information.

12. An image processing method, comprising:

acquiring a current image and a reference image;

and fusing the image information of the current image and the image information of the reference image to obtain fused image information, wherein the fused image information is used for determining a target object in the current image.

13. An image processing method, comprising:

receiving a current image and a reference image, wherein the current image and the reference image both contain a target object;

fusing the image information of the current image and the image information of the reference image to obtain fused image information, wherein the fused image information is used for determining a target object in the current image;

and outputting the fused image information.

14. A method of product inspection, comprising:

acquiring a current image and a reference image of a product;

and detecting based on the fused image information to obtain a detection result of the product.

15. The method according to claim 14, wherein fusing the image information of the current image with the image information of the reference image to obtain fused image information comprises:

obtaining a second image feature of the current image based on a feature of a first pixel in the first image feature and a feature of a second pixel in the reference image feature, wherein the first pixel is any one pixel in the first image feature, and the second pixel is a pixel in a target area corresponding to the first pixel in the reference image feature;

and determining the second image characteristic as the fused image information.

16. The method of claim 15, wherein the reference image comprises at least one of: the current image and the historical image.

17. A computer-readable storage medium, comprising a stored program, wherein when the program runs, the program controls an apparatus on which the computer-readable storage medium is located to execute the target object detection method according to any one of claims 1 to 11, or the image processing method according to any one of claims 12 to 13, or the product detection method according to any one of claims 14 to 16.

18. A computer terminal comprising a memory and a processor for executing a program stored in the memory, wherein the program when executed performs a method for detecting a target object according to any one of claims 1 to 11, or an image processing method according to any one of claims 12 to 13, or a product detection method according to any one of claims 14 to 16.

19. A target object detection system, comprising:

a processor; and

a memory coupled to the processor for providing instructions to the processor for processing the following processing steps: acquiring a current image and a reference image; fusing the image information of the current image and the image information of the reference image to obtain fused image information; and detecting based on the fused image information to obtain a detection result of the target object in the current image.