CN113297405A

CN113297405A - Data processing method and system, computer readable storage medium and processing device

Info

Publication number: CN113297405A
Application number: CN202010694588.0A
Authority: CN
Inventors: 刘潇
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-07-17
Filing date: 2020-07-17
Publication date: 2021-08-24

Abstract

The application discloses a data processing method and system, a computer readable storage medium and a processing device. Wherein, the method comprises the following steps: acquiring an image of a target object; processing the image of the target object to at least obtain the target space characteristics of the image of the target object; and matching the image of the target object with the images of other objects at least based on the target space characteristics to obtain a matching result, wherein the matching result is used for representing whether the attribute parameters of the other objects are the same as the attribute parameters of the target object. The method and the device solve the technical problem that the matching precision is low when the target commodities are matched in the related technology.

Description

Data processing method and system, computer readable storage medium and processing device

Technical Field

The present application relates to the field of image recognition, and in particular, to a data processing method and system, a computer-readable storage medium, and a processing device.

Background

In the field of e-commerce shopping, in order to facilitate a user to purchase goods, the user can search for goods to be purchased in a shopping platform or a shopping live broadcast in a mode of searching for the goods by a picture. However, the types of commodities purchased by users at present are various, the overall similarity of the same type of commodities is high, the matching precision is low when the target commodity is matched by the existing image recognition method, and the purpose of recognizing the same type of commodities cannot be achieved.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the application provides a data processing method and system, a computer readable storage medium and a processing device, so as to at least solve the technical problem of low matching precision when matching a target commodity in the related technology.

According to an aspect of an embodiment of the present application, there is provided a data processing method, including: acquiring an image of a target object; processing the image of the target object to at least obtain the target space characteristics of the image of the target object; and matching the image of the target object with the images of other objects at least based on the target space characteristics to obtain a matching result, wherein the matching result is used for representing whether the attribute parameters of the other objects are the same as the attribute parameters of the target object.

According to another aspect of the embodiments of the present application, there is also provided a data processing method, including: acquiring an image of a target object; processing the image of the target object to at least obtain the target space characteristics of the image of the target object; intercepting the video data to obtain images of other objects contained in the video data; matching the image of the target object with the images of other objects at least based on the target space characteristics to obtain a matching result, wherein the matching result is used for representing whether the attribute parameters of the other objects are the same as the attribute parameters of the target object; and obtaining a video segment corresponding to the target object in the video data based on the matching result.

According to another aspect of the embodiments of the present application, there is also provided a data processing method, including: receiving an image searching instruction in a video live broadcasting process; acquiring an image of a target object in a live video based on an image searching instruction; processing the image of the target object to at least obtain the target space characteristics of the image of the target object; matching the image of the target object with the images of other objects at least based on the target space characteristics to obtain a matching result, wherein the matching result is used for representing whether the attribute parameters of the other objects are the same as the attribute parameters of the target object; and displaying the matching result.

According to another aspect of the embodiments of the present application, there is also provided a data processing method, including: receiving an image of a target object; processing the image of the target object to at least obtain the target space characteristics of the image of the target object; matching the image of the target object with the images of other objects at least based on the target space characteristics to obtain a matching result, wherein the matching result is used for representing whether the attribute parameters of the other objects are the same as the attribute parameters of the target object; and displaying the image of the matched object.

According to another aspect of the embodiments of the present application, there is also provided a data processing method, including: acquiring an uploaded short video with a first product displayed and an image of a second product; intercepting the short video to obtain a video frame containing a first product; processing the video frame to at least obtain the spatial characteristics of the video frame, and processing the image of the second product to at least obtain the spatial characteristics of the image; matching the video frame and the image at least based on the spatial features of the video frame and the spatial features of the image to obtain a matching result, wherein the matching result is used for representing whether the attribute parameters of other objects are the same as the attribute parameters of the target object; and displaying the matching result.

According to another aspect of the embodiments of the present application, there is also provided a data processing method, including: acquiring an image of a target object, wherein the image of the target object is acquired based on a shooting device associated with an image processing system; processing the image of the target object to at least obtain the target space characteristics of the image of the target object; matching the image of the target object with the images of other objects at least based on the target space characteristics to obtain a matching result, wherein the matching result is used for representing whether the attribute parameters of the other objects are the same as the attribute parameters of the target object; and displaying the matching result.

According to another aspect of the embodiments of the present application, there is also provided a computer-readable storage medium, which includes a stored program, wherein when the program runs, the apparatus on which the computer-readable storage medium is located is controlled to execute the above-mentioned data processing method.

According to another aspect of the embodiments of the present application, there is also provided a processing apparatus, including: the device comprises a memory and a processor, wherein the processor is used for operating the program stored in the memory, and the program executes the data processing method when running.

According to another aspect of the embodiments of the present application, there is also provided a data processing system, including: a processor; and a memory coupled to the processor for providing instructions to the processor for processing the following processing steps: acquiring an image of a target object; processing the image of the target object to at least obtain the target space characteristics of the image of the target object; and matching the image of the target object with the images of other objects at least based on the target space characteristics to obtain a matching result, wherein the matching result is used for representing whether the attribute parameters of the other objects are the same as the attribute parameters of the target object.

In this embodiment of the application, after the image of the target object is acquired, the image of the target object may be processed to obtain at least a target spatial feature of the image of the target object, and the image of the target object is further matched with images of other objects based on the target spatial feature to obtain a matching result, so as to achieve the purpose of object matching with the same attribute parameters. It is easy to notice that, because the granularity of the spatial features is smaller, the commodity details are more concerned about, so that the commodity identification is carried out through the spatial features, the technical effects of improving the commodity identification precision and improving the shopping experience of a user are achieved, and the technical problem that the matching precision is lower when the target commodities are matched in the related art is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a block diagram of a hardware structure of a computer terminal for implementing a data processing method according to an embodiment of the present application;

FIG. 2 is a flow chart of a first data processing method according to an embodiment of the present application;

FIG. 3 is a flow chart of an alternative data processing method according to an embodiment of the present application;

FIG. 4 is a flow chart of a second data processing method according to an embodiment of the present application;

FIG. 5 is a schematic illustration of an interactive interface according to an embodiment of the present application;

FIG. 6 is a flow chart of a third method of data processing according to an embodiment of the present application;

FIG. 7 is a flow chart of a fourth method of data processing according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a first data processing apparatus according to an embodiment of the present application;

FIG. 9 is a schematic diagram of a second data processing apparatus according to an embodiment of the present application;

FIG. 10 is a schematic diagram of a third data processing apparatus according to an embodiment of the present application;

FIG. 11 is a schematic diagram of a fourth data processing apparatus according to an embodiment of the present application;

FIG. 12 is a flow chart of a fifth data processing method according to an embodiment of the present application;

fig. 13 is a flowchart of a sixth data processing method according to an embodiment of the present application;

FIG. 14 is a schematic diagram of a fifth data processing apparatus according to an embodiment of the present application;

FIG. 15 is a schematic diagram of a sixth data processing apparatus according to an embodiment of the present application; and

fig. 16 is a block diagram of a computer terminal according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

First, some terms or terms appearing in the description of the embodiments of the present application are applicable to the following explanations:

spatial characteristics: and dividing an image characteristic according to the coordinate position of a pixel point in the image.

Example 1

There is also provided a data processing method according to an embodiment of the present application, it should be noted that the steps shown in the flowchart of the drawings may be executed in a computer system such as a set of computer executable instructions, and that although a logical order is shown in the flowchart, in some cases, the steps shown or described may be executed in an order different from that shown.

The method provided by the embodiment of the application can be executed in a mobile terminal, a computer terminal or a similar operation device. Fig. 1 shows a hardware configuration block diagram of a computer terminal (or mobile device) for implementing a data processing method. As shown in fig. 1, the computer terminal 10 (or mobile device 10) may include one or more (shown as 102a, 102b, … …, 102 n) processors 102 (the processors 102 may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA, etc.), a memory 104 for storing data, and a transmission device 106 for communication functions. Besides, the method can also comprise the following steps: a display, an input/presentation interface (I/O interface), a Universal Serial BUS (USB) port (which may be included as one of the ports of the BUS), a network interface, a power source, and/or a camera. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

It should be noted that the one or more processors 102 and/or other data processing circuitry described above may be referred to generally herein as "data processing circuitry". The data processing circuitry may be embodied in whole or in part in software, hardware, firmware, or any combination thereof. Further, the data processing circuit may be a single stand-alone processing module, or incorporated in whole or in part into any of the other elements in the computer terminal 10 (or mobile device). As referred to in the embodiments of the application, the data processing circuit acts as a processor control (e.g. selection of a variable resistance termination path connected to the interface).

The memory 104 may be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the data processing method in the embodiment of the present application, and the processor 102 executes various functional applications and data processing by executing the software programs and modules stored in the memory 104, that is, implementing the data processing method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 can be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.

The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computer terminal 10 (or mobile device).

It should be noted here that in some alternative embodiments, the computer device (or mobile device) shown in fig. 1 described above may include hardware elements (including circuitry), software elements (including computer code stored on a computer-readable medium), or a combination of both hardware and software elements. It should be noted that fig. 1 is only one example of a particular specific example and is intended to illustrate the types of components that may be present in the computer device (or mobile device) described above.

It should be noted here that, in some embodiments, the computer device (or mobile device) shown in fig. 1 has a touch display (also referred to as a "touch screen" or "touch display screen"). In some embodiments, the computer device (or mobile device) shown in fig. 1 above has a Graphical User Interface (GUI) with which a user can interact by touching finger contacts and/or gestures on a touch-sensitive surface, where the human interaction functionality optionally includes the following interactions: executable instructions for creating web pages, drawing, word processing, making electronic documents, games, video conferencing, instant messaging, emailing, call interfacing, playing digital video, playing digital music, and/or web browsing, etc., for performing the above-described human-computer interaction functions, are configured/stored in one or more processor-executable computer program products or readable storage media.

Under the operating environment, the application provides a data processing method as shown in fig. 2. Fig. 2 is a flowchart of a first data processing method according to an embodiment of the present application. As shown in fig. 2, the method may include the steps of:

in step S202, an image of the target object is acquired.

The target object in the above steps may be a commodity in the e-commerce shopping field, may also be content that needs to be uploaded to the cloud for storage, and may also be a three-dimensional model, but is not limited thereto. For example, in a product search scenario in the e-commerce shopping field, the target object may be a product or a part of a product that the user needs to perform a homogeneous product search, but is not limited thereto. For another example, in a commodity search scene in the live broadcast field, the target object may be a model wearing a specific garment, that is, the model and the garment may be collectively used as one target object to perform a commodity search of the same type.

The image of the target object in the above steps may be an image including the target object, for example, in the field of e-commerce shopping, the image may be an image of a commodity that a user wants to purchase, an image of a same-type commodity or a similar commodity that the user searches on the network, or an image of a commodity that the user captures in a live broadcast, but is not limited thereto.

For example, in the case of searching for the same type of product in live broadcasting, the target object may be a product sold in live broadcasting, an image of the target object may be obtained by capturing a video frame in live broadcasting, and it is necessary to ensure that the captured video includes the product.

Step S204, the image of the target object is processed, and at least the target space characteristic of the image of the target object is obtained.

The target space features in the above steps may be defined according to user needs, for example, in a live scene, the target space features may be detail features of the products highlighted by the anchor, for example, clothing features.

It should be noted that, the above steps may obtain not only the spatial features of the image of the target object, but also the attribute features of the target object, such as color, style, length, pattern, etc., but the present invention is not limited thereto, and any features for identifying the goods may be included in the above range. For the image of the commodity, the commodity is often located in the middle of the image, and for the non-homogeneous commodity, the detail attributes of the specific part of the commodity are often different, so that the specific coordinate area in the image can be determined by processing the image, and the image characteristics in the area can determine whether the two commodities are homogeneous commodities or not.

In an optional embodiment, in order to improve the accuracy of image recognition, feature extraction may be performed on an image of a target object, an image feature of the target object is extracted, and a spatial feature is added to the image feature, so that the finally obtained target feature focuses more on the detail content of the target object. In order to realize the purpose of feature extraction, a feature expression network can be constructed in advance, and the image of the target object is input into the feature expression network, so that the target feature of the image is obtained by using the feature expression network.

And step S206, matching the image of the target object with the images of other objects at least based on the target space characteristics to obtain a matching result, wherein the matching result is used for representing whether the attribute parameters of the other objects are the same as the attribute parameters of the target object.

The images of the other objects in the above steps may be images obtained from different databases, for example, images of all goods in the whole e-commerce shopping platform, and for example, still taking the retrieval of the same type of goods in the live broadcast as an example, the images of the other objects in the above steps may be images of goods sold in the live broadcast, and the images may refer to images provided in the shopping link of each goods in the e-commerce shopping platform, but are not limited thereto.

The above attribute parameters may be attributes for determining the same type of goods, including but not limited to the color, style, material, length, etc. of the goods. The higher the similarity of the two commodities is, the more similar the attribute parameters of the two commodities are, and when the similarity of the two commodities reaches a certain condition, the two commodities can be considered as the same type of commodity.

In an optional embodiment, in order to match the image of the target object with the images of other objects, a target feature of the image of the target object may be extracted, and features of the images of other objects (the features also include spatial features of the images of other objects) may be extracted from the images of other objects. If the similarity between the target object and other objects is greater than a set value, the attribute parameters of the target object and other objects can be determined to be the same, and the purpose of searching the same-style commodities is achieved; if the similarity of the target object and the other objects is less than the set value, it may be determined that the attributes of the target object and the other objects are different.

For example, for image 1 of article a, after processing image 1, attribute feature a1 and spatial feature B1 of article a can be obtained, and similarly, by acquiring image 2 of article B and image 3 of article c from the database, and processing image 2 and image 3 respectively, attribute feature a2 and spatial feature B2 of article B, and attribute feature A3 and spatial feature B3 of article c can be obtained. And respectively calculating the similarity of the attribute feature A1 and the attribute feature A2 and the similarity of the spatial feature B1 and the spatial feature B2, further obtaining the similarity of the commodity a and the commodity B through the weighted sum of the two feature similarities, and similarly calculating the similarity of the commodity a and the commodity c. By matching the similarity of the commodities, the commodity corresponding to the maximum similarity can be selected as a matching object, that is, when the similarity of the commodity a and the commodity c is maximum, the commodity c can be determined to be the same type of the commodity a.

By the scheme provided by the embodiment of the application, after the image of the target object is acquired, the image of the target object can be processed to obtain at least the target spatial feature of the image of the target object, and the image of the target object is matched with the images of other objects based on the target spatial feature to obtain the matching result, so that the purpose of object matching with the same attribute parameters is achieved. It is easy to notice that, because the granularity of the spatial features is smaller, the commodity details are more concerned about, so that the commodity identification is carried out through the spatial features, the technical effects of improving the commodity identification precision and improving the shopping experience of a user are achieved, and the technical problem that the matching precision is lower when the target commodities are matched in the related art is solved.

In the foregoing embodiment of the present application, processing the image of the target object, and obtaining at least a target spatial feature of the image of the target object includes: processing the image of the target object, and determining a target area where the target object is located; and processing the target area by using the feature expression network to obtain the target feature of the image of the target object, wherein the target feature comprises a target space feature.

In the embodiment of the application, the detection of the target area can be realized by using a Single Shot multi box Detector (SSD) detection method, the method has high precision and high speed, and the base Network can use a lightweight Network mobile Network (embedded device Network). The SSD algorithm combines the regression idea in the YOLO (You Only Look at Once) and the Anchor frame Anchor mechanism in the fast-RCNN (Region convolutional neural Network), uses the multi-scale Region characteristics of each position of the full map to carry out regression, not Only keeps the characteristic of fast YOLO speed, but also ensures that the window prediction is more accurate as the fast-RCNN. Meanwhile, a convolution kernel is adopted on the feature map to predict the category and coordinate offset of a series of Default Bounding Boxes (Default Bounding Boxes). In order to improve the detection accuracy, the SSD adopts a multi-scale feature map, the sizes of the layers are sequentially decreased progressively, and multi-scale prediction can be realized.

The feature expression Network in the above steps may use mobilenet 2 (mobile Network V2, second version embedded device Network) as a base Network, which is lighter than networks such as ResNet50(Residual Network 50, 50 layers of Residual networks), and has a fast processing speed and a comparable effect, and may also use a deeper Network, for example, inclusion ResNet V2 (initial Residual Network), but is not limited thereto, and may be determined according to actual needs.

The target feature in the above step may include not only a spatial feature of an image of the target object but also an attribute feature of the target object, but is not limited thereto, and any feature for identifying the commodity may be included in the above range.

In the foregoing embodiment of the present application, processing the target region by using the feature expression network to obtain the target feature of the image of the target object includes: inputting the target area into a feature expression network to obtain the feature of a preset dimension; dividing the characteristics of the preset dimensionality according to a preset dividing mode; performing pooling operation on the divided features to obtain block features; combining the block features to obtain a spatial feature with a preset dimension; and inputting the spatial features with preset dimensionality into the full connection layer to obtain target features, wherein the dimensionality of the target features is smaller than the preset dimensionality.

The preset dimension in the above steps may be 1280 dimensions, and the dimension of the target feature may be 512 dimensions, but is not limited thereto, and may also be determined according to actual needs. The preset division mode may be a nine-grid division mode, and the features are divided into 9 parts, but the preset division mode is not limited to this, and adaptive setting may be performed according to the detection category result, so as to better extract the spatial features.

In the embodiment of the present application, a Pooling operation may be performed in a Generalized-mean Pooling manner to enhance the salient region characteristics.

In an alternative embodiment, the prediction flow of the feature expression network is as follows: the target region can be input into a feature expression network, 1280x7x7 dimensional features are obtained through MobileNet V2, then the features are divided into 9 parts on a channel 7x7, GemPolling operation is carried out on each block of spatial features, firstly, input samples are subjected to power p, then, the average value is obtained, finally, power p is carried out, block features which pay more attention to the significant features are obtained, and finally, all the block features are subjected to merging operation to obtain the 1280 dimensional spatial features. After the 1280-dimensional spatial feature is obtained, the 512-dimensional feature is obtained by performing feature dimension reduction by using the full connection layer, and the feature is used as the target feature of the target object.

In the above embodiment of the present application, the method further includes: acquiring training data, wherein the training data comprises: a first sample and a second sample; processing the training data by using a feature expression network to obtain the features of a first sample and the features of a second sample, wherein the features of the first sample comprise the spatial features of the first sample, and the features of the second sample comprise the spatial features of the second sample; determining a target loss value of the feature expression network based on the features of the first sample and the features of the second sample; and updating the network weight of the feature expression network based on the target loss value.

The training data in the above steps may be similar image matching pairs constructed in advance, that is, the first sample and the second sample are similar images.

It should be noted that the process of processing the training data by using the feature expression network is the same as the process of processing the target area by using the feature expression network, and details are not described herein.

In an alternative embodiment, the training process of the feature expression network is as follows: inputting training data into a feature expression network, obtaining 1280x7x7 dimensional features through MobileNet V2, then dividing the features into 9 parts on a channel 7x7, performing GemPolling operation on each block of spatial features to obtain block features which pay more attention to remarkable features, finally performing combination operation on all the block features to obtain 1280 dimensional spatial features, and after the spatial features are obtained, performing dimensionality reduction through two layers of full connection to obtain 512 dimensional features. And further based on the 512-dimensional features, calculating a loss value of the loss function, and if the loss value does not meet the loss value of the training requirement, namely the calculated loss value is greater than or equal to the target value, updating the network weight of the feature expression network until the loss value meets the loss value of the training requirement, namely the calculated loss value is less than the target value, so that the whole training process can be completed.

In the above embodiments of the present application, determining the target loss value of the feature expression network based on the features of the first sample and the features of the second sample includes: determining a classification result of the first sample based on the characteristics of the first sample, and determining a classification result of the second sample based on the characteristics of the second sample; determining a classification loss value based on the classification result of the first sample and the classification result of the second sample; obtaining a metric distance between the features of the first sample and the features of the second sample, and determining a metric loss value; and acquiring the weighted sum of the classification loss value and the measurement loss value to obtain a target loss value.

In the embodiment of the application, the method can be realized by adopting a combined training mode of a classification loss function and a metric loss function, wherein the classification loss function is used for optimizing an image classification result to perform feature learning, and the metric loss function is used for optimizing a metric distance of a matching pair to perform feature learning. Optionally, the weights of the two loss functions may be set to 0.3 and 0.7, respectively, but not limited thereto, and may be adjusted according to actual use requirements.

In addition, the existing distance measurement method includes, but is not limited to, euclidean distance, cosine similarity distance, and the like, in this embodiment, a measurement mode of the cosine similarity distance may be adopted, which only focuses on the length difference of the distance relative to the euclidean distance measurement, and the cosine similarity distance focuses more on the similarity itself, and may more give a reasonable similarity result on the multidimensional feature measurement.

In an alternative embodiment, after obtaining 512-dimensional features, image classification may be performed based on spatial features, and a classification loss function (i.e., the above-mentioned classification loss value) is calculated based on two classification results of similar images; meanwhile, the cosine similarity distance of the two spatial features can be calculated, further a metric loss function (namely the metric loss value) is calculated, and finally a final loss value is obtained through a classification loss function 0.3+ a metric loss function 0.7.

In the above embodiments of the present application, matching the image of the target object with the images of other objects based on at least the target spatial feature, and obtaining a matching result includes: processing images of other objects to obtain the characteristics of the images of the other objects, wherein the characteristics comprise the spatial characteristics of the images of the other objects; and matching the target characteristics with the characteristics of the images of other objects to obtain a matching result.

In an optional embodiment, in order to match the image of the target object with the images of other objects, a target feature of the image of the target object may be extracted, and features of the images of other objects (the features also include spatial features of the images of other objects) may be extracted from the images of other objects.

In the above embodiments of the present application, matching the target feature with the features of the images of other objects, and obtaining a matching result includes: and obtaining the measurement distance of the target characteristic and the characteristic of the image of other objects to obtain a matching result.

In an alternative embodiment, the same processing manner may be adopted, the image of each other object is input to the feature expression network, and dimension reduction is performed through full connection to obtain the feature of the image of each other object. After obtaining the respective features of the image of the target object and the image of the at least one other object, distance measurement can be performed on the features, the cosine similarity distance of the features of the target feature and the image of the at least one other object is calculated, a one-dimensional measurement matrix is obtained, the other object with the minimum measurement distance is further obtained, the other object with the same attribute parameters as the target object is obtained, and then a matching result is obtained.

In the following, referring to fig. 3, the detailed life of the preferred embodiment of the present application is still exemplified by the retrieval of the same type of goods in live broadcast. As shown in fig. 3, the method may include four stages of target detection, feature expression, feature measurement, and commodity retrieval, which specifically include the following:

in the target detection stage, the captured video frames and the captured commodity image can be subjected to target commodity detection by using a detection method of MobileNet + SSD, so as to obtain a detection result, that is, a commodity box area.

For the feature expression stage, the commodity box region of the target detection can be input into the MobileNetV2 network, and the 512-dimensional feature of the commodity is finally obtained through the full connection layer. In the training process, features of 1280x7x7 dimensions can be divided into nine-square grids, each space feature is subjected to GemPolling operation to obtain block features, and finally all the block features are subjected to combination operation to obtain the final spatial feature of 1280 dimensions. And a mode of combining the classified loss and the metric loss can be adopted, and finally the loss is obtained by weighted summation of the classified loss and the metric loss. In the reasoning process, 512-dimensional characteristics of the video frame and the commodity image can be directly obtained.

For the feature measurement stage, cosine similarity distances between all intercepted video frames M in a video and all commodity maps N can be calculated.

In the commodity retrieval stage, the measurement distances of all intercepted video frames M and all commodity images N can be defined as a two-dimensional measurement matrix, then the commodity with the minimum distance retrieved by each frame of video frame is obtained, the measurement matrix with one dimension and the length of M is obtained, the continuous identical commodity matching interval is calculated, the smooth noise is added, each video clip of different matched commodities is obtained, namely the video clip of the same type of commodity obtained by searching the video based on the commodity images is finally obtained, and the commodity identification process is completed.

By the scheme, the commodity identification method based on the spatial features is provided, global features can be divided into spatial feature blocks, detailed features are obtained, and salient region features are further strengthened through GemPolling; feature expression is mutually complemented through combined training of the classified loss and the measured loss to obtain a feature expression result with finer granularity; and a cosine similarity distance measurement mode is adopted, so that the similarity result on the multi-dimensional characteristic measurement is more reasonable.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.

Example 2

According to the embodiment of the application, a data processing method is further provided.

Fig. 4 is a flowchart of a second data processing method according to an embodiment of the present application. As shown in fig. 4, the method may include the steps of:

in step S402, an image of a target object is received.

In an alternative embodiment, to facilitate the user's image of the target object, the user may be provided with an interactive interface as shown in FIG. 5, and the user may upload the image by clicking an "upload image" button, or by directly dragging the image file to the dashed box.

Step S404, the image of the target object is processed to obtain at least a target spatial feature of the image of the target object.

Step S406, matching the image of the target object with the images of other objects at least based on the target space characteristics to obtain a matching result, wherein the matching result is used for representing whether the attribute parameters of the other objects are the same as the attribute parameters of the target object;

and step S408, displaying the matching result.

In an alternative embodiment, as shown in fig. 5, the matching result may be displayed in a display area of the interactive interface for easy viewing by the user.

In the above embodiment of the present application, the method further includes: receiving a selected target database; images of other objects stored in the target database are acquired.

The target database in the above steps may be a database that a user needs to perform commodity matching, may be a database of all commodities of an e-commerce shopping platform, may also be a database of a certain store, may also be a database of all commodities in a live broadcast platform, may also be a database of a certain live broadcast room, but is not limited thereto, and may be set according to an actual application scenario.

In an alternative embodiment, after the user uploads the image of the target object, all databases in the scene may be provided to the user for selection, thereby further improving the efficiency and accuracy of commodity retrieval.

In the foregoing embodiment of the present application, processing the target area by using the feature expression network to obtain the target feature includes: inputting the target area into a feature expression network to obtain the feature of a preset dimension; dividing the characteristics of the preset dimensionality according to a preset dividing mode; performing pooling operation on the divided features to obtain block features; combining the block features to obtain a spatial feature with a preset dimension; and inputting the spatial features with preset dimensionality into the full connection layer to obtain target features, wherein the dimensionality of the target features is smaller than the preset dimensionality.

It should be noted that the preferred embodiments described in the above examples of the present application are the same as the schemes, application scenarios, and implementation procedures provided in example 1, but are not limited to the schemes provided in example 1.

Example 3

Fig. 6 is a flowchart of a third data processing method according to an embodiment of the present application. As shown in fig. 6, the method may include the steps of:

step S602, acquiring the uploaded short video with the first product displayed thereon and the image of the second product.

The short video may be a live video in a live platform, or may be short video data shot by a merchant, but is not limited thereto. The first product can be a commodity displayed to a user in a live broadcast process, the second product can be a commodity sold to the user in the live broadcast process, and can also be all commodities in the whole e-commerce shopping platform, but the first product is not limited to the commodity, and the second product can be determined according to actual needs.

Step S604, capturing the short video to obtain a video frame including the first product.

In an optional embodiment, since the first product does not appear in the short video at all times, in order to implement the commodity identification, the short video may be intercepted, and only the video frames including the commodities are extracted, for example, M frames of video frames may be extracted, and the commodities included in the M frames of video frames may be the same or different.

Step S606, processing the video frame to at least obtain the spatial feature of the video frame, and processing the image of the second product to obtain the spatial feature of the image;

step S608, matching the video frame and the image at least based on the spatial feature of the video frame and the spatial feature of the image to obtain a matching result, wherein the matching result is used for representing whether the attribute parameters of other objects are the same as the attribute parameters of the target object;

in an alternative embodiment, the similarity between the first product and the second product can be characterized by calculating a metric distance between the spatial feature of the video frame and the spatial feature of the image, and further determining whether the attribute parameter of the first product is the same as the attribute parameter of the second product by comparing the calculated metric distance with a preset value, wherein the metric distance is less than or equal to the preset value, and the attribute parameter of the first product can be determined to be the same as the attribute parameter of the second product; if the metric distance is greater than the predetermined value, it may be determined that the attribute parameters of the first product are different from the attribute parameters of the second product.

And step S610, displaying a matching result.

In the above embodiment of the present application, when the matching result is that the attribute parameter of the first product is the same as the attribute parameter of the second product, the method further includes: acquiring continuous video frames containing a first product to obtain a video clip corresponding to the first product; and displaying the video clip.

In an optional embodiment, after the comparison result of each frame of video frame is obtained, the commodity type in each frame of video frame can be determined, and the same commodity is often displayed within a period of time, so that the video frames of the same commodity and other video frames between the video frames can be combined based on the comparison result, and the video clip of the commodity can be obtained, thereby realizing the purpose of dividing real-time content into offline video clips with commodity semantics.

In the above embodiments of the present application, processing a video frame to obtain at least a spatial feature of the video frame, and processing an image of a second product to obtain the spatial feature of the image includes: processing the video frame and the image, and determining the area where the first product is located and the area where the second product is located; and respectively processing the area where the first product is located and the area where the second product is located by utilizing the feature expression network to obtain the features of the video frame and the features of the image, wherein the features of the video frame comprise the spatial features of the video frame, and the features of the image comprise the spatial features of the image.

In the above embodiment of the present application, the processing, by using the feature expression network, the region where the first product is located and the region where the second product is located, respectively, to obtain the features of the video frame and the features of the image includes: inputting the area where the first product is located and the area where the second product is located into a feature expression network to obtain a video frame feature with a preset dimension and an image feature with the preset dimension; dividing the video frame characteristics of the preset dimensionality and the image characteristics of the preset dimensionality according to a preset dividing mode; performing pooling operation on the divided video frame characteristics and image characteristics to obtain video frame block characteristics and image block characteristics; merging the video frame blocking features to obtain video frame spatial features with preset dimensionality, and merging the image blocking features to obtain image spatial features with preset dimensionality; and inputting the video frame spatial features and the image spatial features into the full-connection layer to obtain the features of the video frames and the features of the images, wherein the dimension of the features of the video frames and the dimension of the features of the images are smaller than the preset dimension.

In the above embodiments of the present application, matching the video frame and the image based on at least the spatial feature of the video frame and the spatial feature of the image, and obtaining a matching result includes: and obtaining the measurement distance between the spatial feature of the video frame and the spatial feature of the image to obtain a matching result.

Example 4

Fig. 7 is a flowchart of a fourth data processing method according to an embodiment of the present application. As shown in fig. 7, the method may include the steps of:

step S702, acquiring an image of the target object, wherein the image of the target object is acquired based on a shooting device associated with the image processing system.

Optionally, the image processing system is arranged on a robot, wherein the robot is used for voice interaction.

The robot may be a smart speaker, but is not limited thereto, and the smart speaker may be provided with a camera (i.e., the above-mentioned shooting device), and the camera may be used to shoot an image of the product that the user wishes to purchase.

Step S704, the image of the target object is processed to obtain at least a target spatial feature of the image of the target object.

Step S706, matching the image of the target object with the images of other objects at least based on the target space characteristics to obtain a matching result, wherein the matching result is used for representing whether the attribute parameters of the other objects are the same as the attribute parameters of the target object.

And step S708, displaying the matching result.

In the foregoing embodiment of the present application, processing the image of the target object, and obtaining at least a target spatial feature of the image of the target object includes: processing the image of the target object, and determining a target area where the target object is located; and processing the target area by using the feature expression network to obtain the image target feature of the target object, wherein the target feature comprises a target space feature.

In the above embodiments of the present application, determining the target loss value of the feature expression network based on the features of the first sample and the features of the second sample includes: determining a classification result of the first sample based on the characteristics of the first sample, and determining a classification result of the second sample based on the second characteristics; determining a classification loss value based on the classification result of the first sample and the classification result of the second sample; obtaining a metric distance between the features of the first sample and the features of the second sample, and determining a metric loss value; and acquiring the weighted sum of the classification loss value and the measurement loss value to obtain a target loss value.

Example 5

According to an embodiment of the present application, there is also provided a data processing apparatus for implementing the data processing method, as shown in fig. 8, the apparatus 800 includes: an acquisition module 802, a processing module 804, and a matching module 806.

The obtaining module 802 is configured to obtain an image of a target object; the processing module 804 is configured to process the image of the target object to obtain at least a target spatial feature of the image of the target object; the matching module 806 is configured to match the image of the target object with the images of other objects based on at least the target spatial feature to obtain a matching result, where the matching result is used to represent whether the attribute parameters of the other objects are the same as the attribute parameters of the target object.

It should be noted here that the above-mentioned obtaining module 802, processing module 804 and matching module 806 correspond to steps S202 to S206 in embodiment 1, and the three modules are the same as the corresponding steps in the implementation example and application scenario, but are not limited to the disclosure in embodiment 1. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.

In the above embodiments of the present application, the processing module includes: the device comprises a first determination unit and a first processing unit.

The first determining unit is used for processing the image of the target object and determining a target area where the target object is located; the first processing unit is used for processing the target area by using the feature expression network to obtain the target feature of the image of the target object, wherein the target feature comprises a target space feature.

In the above embodiments of the present application, the first processing unit includes: the device comprises a first input subunit, a dividing subunit, a first operation subunit, a second operation subunit and a second input subunit.

The first input subunit is used for inputting the target area into the feature expression network to obtain the feature of the preset dimension; the dividing subunit is used for dividing the characteristics of the preset dimensionality according to a preset dividing mode; the first operation subunit is used for performing pooling operation on the divided features to obtain block features; the second operation subunit is used for carrying out merging operation on the block features to obtain the spatial features with preset dimensions; the second input subunit is used for inputting the spatial features with the preset dimensionality into the full connection layer to obtain the target features, wherein the dimensionality of the target features is smaller than the preset dimensionality.

In the above embodiment of the present application, the apparatus further includes: the device comprises a determining module and an updating module.

Wherein, the acquisition module is also used for acquiring training data, wherein, training data includes: a first sample and a second sample; the first processing unit is further configured to process the training data by using a feature expression network to obtain features of the first sample and features of the second sample, where the features of the first sample include spatial features of the first sample, and the features of the second sample include spatial features of the second sample; the determining module is used for determining a target loss value of the feature expression network based on the features of the first sample and the features of the second sample; the updating module is used for updating the network weight of the feature expression network based on the target loss value.

In the above embodiments of the present application, the determining module includes: the device comprises a second determining unit, a third determining unit, a first acquiring unit and a second acquiring unit.

The second determining unit is used for determining the classification result of the first sample based on the characteristics of the first sample and determining the classification result of the second sample based on the characteristics of the second sample; the third determining unit is used for determining a classification loss value based on the classification result of the first sample and the classification result of the second sample; the first acquisition unit is used for acquiring the metric distance between the characteristic of the first sample and the characteristic of the second sample and determining a metric loss value; the second obtaining unit is used for obtaining the weighted sum of the classification loss value and the measurement loss value to obtain a target loss value.

In the above embodiments of the present application, the matching module includes: a second processing unit and a matching unit.

The second processing unit is used for processing the images of other objects to obtain the characteristics of the images of other objects, wherein the characteristics comprise the spatial characteristics of the images of other objects; the matching unit is used for matching the target characteristics with the characteristics of the images of other objects to obtain matching results.

In the above embodiments of the present application, the matching unit is further configured to obtain a metric distance between the target feature and a feature of an image of another object, and obtain a matching result.

Example 6

According to an embodiment of the present application, there is also provided a data processing apparatus for implementing the data processing method, as shown in fig. 9, the apparatus 900 includes: a receiving module 902, a processing module 904, a matching module 906, and a presentation module 908.

Wherein the receiving module 902 is configured to receive an image of a target object; the processing module 904 is configured to process the image of the target object to obtain at least a target spatial feature of the image of the target object; the matching module 906 is configured to match the image of the target object with images of other objects based on at least the target spatial feature to obtain a matching result, where the matching result is used to represent whether the attribute parameters of the other objects are the same as the attribute parameters of the target object; the display module 908 is used for displaying the matching result.

It should be noted here that the receiving module 902, the processing module 904, the matching module 906 and the displaying module 908 correspond to steps S402 to S408 in embodiment 2, and the four modules are the same as the corresponding steps in the implementation example and application scenario, but are not limited to the disclosure in embodiment 2. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.

In the above embodiment of the present application, the apparatus further includes: and an acquisition module.

The receiving module is also used for receiving the selected target database; the acquisition module is used for acquiring images of other objects stored in the target database.

In the above embodiments of the present application, the matching unit is further configured to obtain the metric distance between the target feature and the image of the other object, and obtain the matching result.

Example 7

According to an embodiment of the present application, there is also provided a data processing apparatus for implementing the data processing method, as shown in fig. 10, the apparatus 1000 includes: an acquisition module 1002, a truncation module 1004, a processing module 1006, a matching module 1008, and a presentation module 1010.

The obtaining module 1002 is configured to obtain an uploaded short video with a first product displayed thereon and an image of a second product; the intercepting module 1004 is configured to intercept the short video to obtain a video frame including a first product; the processing module 1006 is configured to process the video frame to obtain at least a spatial feature of the video frame, and process the image of the second product to obtain a spatial feature of the image; the matching module 1008 is configured to match the video frame with the image based on at least the spatial feature of the video frame and the spatial feature of the image to obtain a matching result, where the matching result is used to represent whether the attribute parameters of the other object are the same as the attribute parameters of the target object; the display module 1010 is used for displaying the matching result.

It should be noted here that the obtaining module 1002, the intercepting module 1004, the processing module 1006, the matching module 1008, and the displaying module 1010 correspond to steps S602 to S610 in embodiment 3, and the five modules are the same as the corresponding steps in the implementation example and application scenario, but are not limited to the disclosure in embodiment 3. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.

The first determining unit is used for processing the video frames and the images and determining the area where the first product is located and the area where the second product is located; the first processing unit is used for processing the area where the first product is located and the area where the second product is located respectively by using the feature expression network to obtain the features of the video frame and the features of the image, wherein the features of the video frame comprise the spatial features of the video frame, and the features of the image comprise the spatial features of the image.

The first input subunit is used for inputting the area where the first product is located and the area where the second product is located into the feature expression network to obtain the video frame feature with the preset dimension and the image with the preset dimension; the dividing subunit is used for dividing the video frame characteristics of the preset dimensionality and the image characteristics of the preset dimensionality according to a preset dividing mode; the first operation subunit is used for performing pooling operation on the divided video frame characteristics and image characteristics to obtain video frame block characteristics and image block characteristics; the second operation subunit is configured to perform merging operation on the video frame blocking features to obtain video frame spatial features of a preset dimension, and perform merging operation on the image blocking features to obtain image spatial features of the preset dimension; the second input subunit is used for inputting the video frame spatial features and the image spatial features to the full connection layer to obtain the features of the video frames and the features of the images, wherein the dimensionality of the features of the video frames and the dimensionality of the features of the images are smaller than a preset dimensionality.

In the above embodiment of the present application, the matching module is further configured to obtain a metric distance between a spatial feature of the video frame and a spatial feature of the image, so as to obtain a matching result.

Example 8

According to an embodiment of the present application, there is also provided a data processing apparatus for implementing the data processing method, as shown in fig. 11, the apparatus 1100 includes: an acquisition module 1102, a processing module 1104, a matching module 1106, and a presentation module 1108.

The acquiring module 1102 is configured to acquire an image of a target object, where the image of the target object is acquired based on a shooting device associated with an image processing system; the processing module 1104 is configured to process the image of the target object to obtain at least a target spatial feature of the image of the target object; the matching module 1106 is configured to match the image of the target object with the images of other objects based on at least the target spatial feature to obtain a matching result, where the matching result is used to represent whether the attribute parameters of the other objects are the same as the attribute parameters of the target object; the display module 1108 is used for displaying the matching result.

It should be noted here that the acquiring module 1102, the processing module 1104, the matching module 1106 and the displaying module 1108 correspond to steps S702 to S708 in embodiment 4, and the four modules are the same as the corresponding steps in the implementation example and application scenario, but are not limited to the disclosure in embodiment 4. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.

The first determining unit is used for processing the image of the target object and determining a target area where the target object is located; the first processing unit is used for processing the target area by using the feature expression network to obtain target features, wherein the target features comprise target space features.

Example 9

Fig. 12 is a flowchart of a fifth data processing method according to an embodiment of the present application. As shown in fig. 12, the method may include the steps of:

in step S1202, an image of the target object is acquired.

Step S1204, processes the image of the target object to obtain at least a target spatial feature of the image of the target object.

Step S1206, intercepting the video data to obtain images of other objects included in the video data.

The video data in the above steps may be live data in a live platform, or may be short video data shot by a merchant, but is not limited thereto.

And step S1208, matching the image of the target object with the images of other objects at least based on the target spatial features to obtain a matching result, wherein the matching result is used for representing whether the attribute parameters of the other objects are the same as the attribute parameters of the target object.

Step 1210, obtaining a video segment corresponding to the target object in the video data based on the matching result.

In the foregoing embodiment of the present application, obtaining, based on the matching result, a video segment corresponding to the target object in the video data includes: under the condition that the matching result is that the attribute parameters of other objects are the same as the attribute parameters of the target object, acquiring a plurality of video frames containing other objects in the video data; and obtaining a video clip corresponding to the target object based on the time information of the plurality of video frames.

The time information in the above step may be shooting time information of a plurality of video frames, and may be represented by way of a time stamp, but is not limited thereto. The temporal information may be determined based on the location of the video frame in the video data, as well as the sampling rate of the video frame.

For example, still taking the retrieval of the same type of commodities in live broadcasting as an example, the image of the target object may be an intercepted video frame, and the other objects may be commodities sold in a live broadcasting room, and the commodity semantics in each frame of video frame may be determined by matching the commodities in each frame of video frame with the sold commodities, so as to segment the real-time content into offline video segments with commodity semantics, thereby improving the effect of live broadcasting playback on the distribution side, and improving the click rate and purchase conversion rate of users.

Example 10

Fig. 13 is a flowchart of a sixth data processing method according to an embodiment of the present application. As shown in fig. 13, the method may include the steps of:

step 1302, in the process of live video, receiving an image search instruction.

The image search instruction in the above step may be an instruction sent in the process of watching the live broadcast by the user for searching images of the same-style commodities in real time. The user may issue the instruction by voice or by operating a corresponding button in the client, but the instruction is not limited thereto.

In an alternative embodiment, when the user needs to search for the same-style product, the user may send a voice of "please search for the same-style product", and after receiving the voice, the client or the server may determine that the image search instruction is received by performing voice signal processing on the voice.

In another alternative embodiment, when the user needs to perform the same-style product searching, the user may click a button of "same-style product searching" in the client, so that the client or the server may receive the image searching instruction.

Step S1304, based on the image search instruction, obtains an image of a target object in the live video.

In an alternative embodiment, after receiving the image search instruction, the image search instruction may be analyzed to determine the commodity or the part of the commodity that the user needs to search, and then the live video is intercepted to obtain the image containing the relevant commodity or the part of the commodity.

In step S1306, the image of the target object is processed to obtain at least a target spatial feature of the image of the target object.

Step S1308, matching the image of the target object with the images of other objects at least based on the target space characteristics to obtain a matching result, wherein the matching result is used for representing whether the attribute parameters of the other objects are the same as the attribute parameters of the target object;

step S1310, displaying the matching result.

In the above embodiment of the present application, based on the image search instruction, acquiring the image of the target object in the live video includes: acquiring a gesture image corresponding to the image searching instruction; recognizing the gesture image and determining gesture information in the gesture image; and determining a target object in the live video based on the gesture information.

The gesture image in the above step may be a finger image, a palm image, or the like of the user, but is not limited thereto. The gesture information may be, but is not limited to, a motion trajectory of the gesture, an orientation of the gesture, and the like.

In an alternative embodiment, after receiving the image search instruction, the client or the server may determine, by capturing an image of a finger of the user, a direction of the finger of the user through image recognition, and determine that the product in the direction of the finger of the user is the product that the user needs to search for, in order to determine the product that the user needs to search for.

In another alternative embodiment, after receiving the image search instruction, the client or the server may determine, by capturing an image of a finger of the user, a motion trajectory of the finger of the user through image recognition, and determine that the product located on the motion trajectory or enclosed by the motion trajectory is the product that the user needs to search for, in order to determine the product that the user needs to search for.

In the above embodiment of the present application, before displaying the matching result, the method further includes: receiving an area designation instruction; determining a target area in the image of the target object based on the area specifying instruction; determining a mapping area corresponding to the target area in the matching result; and in the process of displaying the matching result, displaying the mapping area according to a preset mode.

The area designation instruction in the above step may be a part of the product that the user wishes to focus on in the process of searching for the same-style product, and the part may be a part specified by the user or a part emphasized by the anchor, but is not limited thereto.

The mapping area in the above step may be a part of the searched same-style product, which is the same as a part of the product that the user desires to focus on as the destination.

The preset mode in the above step may be a mode that the mapping region can be displayed differently from other regions, such as a highlight, a flash frame, and the like, which is not specifically limited in this application.

In an optional embodiment, before searching for the same-style commodity, the user can designate a specific part of the commodity to be searched, so that after the corresponding same-style commodity is searched, the corresponding part in the same-style commodity can be highlighted or differently displayed, and the user can pay attention to the commodity.

Example 11

According to an embodiment of the present application, there is also provided a data processing apparatus for implementing the data processing method, as shown in fig. 14, the apparatus 1400 includes: an acquisition module 1402, a first processing module 1404, a truncation module 1406, a matching module 1408, and a second processing module 1410.

The acquiring module 1402 is configured to acquire an image of a target object; the first processing module 1404 is configured to process the image of the target object to obtain at least a target spatial feature of the image of the target object; the intercepting module 1406 is configured to intercept the video data to obtain images of other objects included in the video data; the matching module 806 is configured to match the image of the target object with images of other objects based on at least the target spatial feature to obtain a matching result, where the matching result is used to represent whether the attribute parameters of the other objects are the same as the attribute parameters of the target object; the second processing module 1410 is configured to obtain a video segment corresponding to the target object in the video data based on the matching result.

It should be noted here that the above-mentioned obtaining module 1402, first processing module 1404, intercepting module 1406, matching module 1408 and second processing module 1410 correspond to steps S1202 to S1210 in embodiment 9, and the five modules are the same as the examples and application scenarios realized by the corresponding steps, but are not limited to the disclosure in embodiment 1. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.

In the above embodiments of the present application, the first processing module includes: the device comprises a first determination unit and a first processing unit.

Example 12

According to an embodiment of the present application, there is also provided a data processing apparatus for implementing the data processing method, as shown in fig. 15, the apparatus 1500 includes: a receiving module 1502, an obtaining module 1504, a processing module 1506, a matching module 1508, and a presentation module 1510.

The receiving module 1502 is used for receiving an image search instruction in a video live broadcast process; the obtaining module 1504 is used for obtaining the image of the target object in the live video based on the image searching instruction; the processing module 1506 is configured to process the image of the target object to obtain at least a target spatial feature of the image of the target object; the matching module 1508 is configured to match the image of the target object with the images of other objects based on at least the target spatial feature to obtain a matching result, where the matching result is used to represent whether the attribute parameters of the other objects are the same as the attribute parameters of the target object; the display module 1510 is used for displaying the matching result.

It should be noted here that the receiving module 1502, the obtaining module 1504, the processing module 1506, the matching module 1508 and the displaying module 1510 correspond to steps S1302 to S1310 in the embodiment 10, and the five modules are the same as the corresponding steps in the implementation example and application scenario, but are not limited to the disclosure in the embodiment 1. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.

In the above embodiments of the present application, the obtaining module includes: the device comprises a first acquisition unit, a recognition unit and a first determination unit.

The first acquisition unit is used for acquiring a gesture image corresponding to the image search instruction; the recognition unit is used for recognizing the gesture image and determining gesture information in the gesture image; the first determination unit is used for determining a target object in the live video based on the gesture information.

In the above embodiment of the present application, the apparatus further includes: the device comprises a first determination module and a second determination module.

The receiving module is further used for receiving an area designation instruction; the first determination module is used for determining a target area in the image of the target object based on the area designation instruction; the second determination module is used for determining a mapping area corresponding to the target area in the matching result; the display module is further used for displaying the mapping area according to a preset mode in the process of displaying the matching result.

In the above embodiments of the present application, the processing module includes: a second determining unit and a first processing unit.

The second determining unit is used for processing the image of the target object and determining a target area where the target object is located; the first processing unit is used for processing the target area by using the feature expression network to obtain the target feature of the image of the target object, wherein the target feature comprises a target space feature.

In the above embodiment of the present application, the apparatus further includes: a third determination module and an update module.

Wherein, the acquisition module is also used for acquiring training data, wherein, training data includes: a first sample and a second sample; the first processing unit is further configured to process the training data by using a feature expression network to obtain features of the first sample and features of the second sample, where the features of the first sample include spatial features of the first sample, and the features of the second sample include spatial features of the second sample; the third determination module is used for determining a target loss value of the feature expression network based on the features of the first sample and the features of the second sample; the updating module is used for updating the network weight of the feature expression network based on the target loss value.

In the above embodiments of the present application, the third determining module includes: the device comprises a third determining unit, a fourth determining unit, a second acquiring unit and a third acquiring unit.

The third determining unit is used for determining the classification result of the first sample based on the characteristics of the first sample and determining the classification result of the second sample based on the characteristics of the second sample; the fourth determining unit is used for determining a classification loss value based on the classification result of the first sample and the classification result of the second sample; the second acquisition unit is used for acquiring the metric distance between the characteristic of the first sample and the characteristic of the second sample and determining a metric loss value; the third obtaining unit is used for obtaining the weighted sum of the classification loss value and the measurement loss value to obtain a target loss value.

Example 13

According to an embodiment of the present application, there is also provided a data processing system including:

a processor; and

a memory coupled to the processor for providing instructions to the processor for processing the following processing steps: acquiring an image of a target object; processing the image of the target object to at least obtain the target space characteristics of the image of the target object; and matching the image of the target object with the images of other objects at least based on the target space characteristics to obtain a matching result, wherein the matching result is used for representing whether the attribute parameters of the other objects are the same as the attribute parameters of the target object.

Example 14

The embodiment of the application can provide a computer terminal, and the computer terminal can be any one computer terminal device in a computer terminal group. Optionally, in this embodiment, the computer terminal may also be replaced with a terminal device such as a mobile terminal.

Optionally, in this embodiment, the computer terminal may be located in at least one network device of a plurality of network devices of a computer network.

In this embodiment, the computer terminal may execute program codes of the following steps in the data processing method: acquiring an image of a target object; processing the image of the target object to at least obtain the target space characteristics of the image of the target object; and matching the image of the target object with the images of other objects at least based on the target space characteristics to obtain a matching result, wherein the matching result is used for representing whether the attribute parameters of the other objects are the same as the attribute parameters of the target object.

Alternatively, fig. 16 is a block diagram of a computer terminal according to an embodiment of the present application. As shown in fig. 16, the computer terminal a may include: one or more processors 1602 (only one of which is shown), and a memory 1604.

The memory may be configured to store software programs and modules, such as program instructions/modules corresponding to the data processing method and apparatus in the embodiments of the present application, and the processor executes various functional applications and data processing by running the software programs and modules stored in the memory, so as to implement the data processing method. The memory may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory remotely located from the processor, and these remote memories may be connected to terminal a through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: acquiring an image of a target object; processing the image of the target object to at least obtain the target space characteristics of the image of the target object; and matching the image of the target object with the images of other objects at least based on the target space characteristics to obtain a matching result, wherein the matching result is used for representing whether the attribute parameters of the other objects are the same as the attribute parameters of the target object.

Optionally, the processor may further execute the program code of the following steps: processing the image of the target object, and determining a target area where the target object is located; and processing the target area by using the feature expression network to obtain the target feature of the image of the target object, wherein the target feature comprises a target space feature.

Optionally, the processor may further execute the program code of the following steps: inputting the target area into a feature expression network to obtain the feature of a preset dimension; dividing the characteristics of the preset dimensionality according to a preset dividing mode; performing pooling operation on the divided features to obtain block features; combining the block features to obtain a spatial feature with a preset dimension; and inputting the spatial features with preset dimensionality into the full connection layer to obtain target features, wherein the dimensionality of the target features is smaller than the preset dimensionality.

Optionally, the processor may further execute the program code of the following steps: acquiring training data, wherein the training data comprises: a first sample and a second sample; processing the training data by using a feature expression network to obtain the features of a first sample and the features of a second sample, wherein the features of the first sample comprise the spatial features of the first sample, and the features of the second sample comprise the spatial features of the second sample; determining a target loss value of the feature expression network based on the features of the first sample and the features of the second sample; and updating the network weight of the feature expression network based on the target loss value.

Optionally, the processor may further execute the program code of the following steps: determining a classification result of the first sample based on the characteristics of the first sample, and determining a classification result of the second sample based on the characteristics of the second sample; determining a classification loss value based on the classification result of the first sample and the classification result of the second sample; obtaining a metric distance between the features of the first sample and the features of the second sample, and determining a metric loss value; and acquiring the weighted sum of the classification loss value and the measurement loss value to obtain a target loss value.

Optionally, the processor may further execute the program code of the following steps: processing images of other objects to obtain the characteristics of the images of the other objects, wherein the characteristics comprise the spatial characteristics of the images of the other objects; and matching the target characteristics with the characteristics of the images of other objects to obtain a matching result.

Optionally, the processor may further execute the program code of the following steps: and acquiring the measurement distance of the target characteristic and the images of other objects to obtain a matching result.

The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: receiving an image of a target object; processing the image of the target object to at least obtain the target space characteristics of the image of the target object; matching the image of the target object with the images of other objects at least based on the target space characteristics to obtain a matching result, wherein the matching result is used for representing whether the attribute parameters of the other objects are the same as the attribute parameters of the target object; and displaying the image of the matched object.

Optionally, the processor may further execute the program code of the following steps: receiving a selected target database; images of other objects stored in the target database are acquired.

The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: acquiring an uploaded short video with a first product displayed and an image of a second product; intercepting the short video to obtain a video frame containing a first product; processing the video frame to at least obtain the spatial characteristics of the video frame, and processing the image of the second product to at least obtain the spatial characteristics of the image; matching the video frame and the image at least based on the spatial features of the video frame and the spatial features of the image to obtain a matching result, wherein the matching result is used for representing whether the attribute parameters of other objects are the same as the attribute parameters of the target object; and displaying the matching result.

Optionally, the processor may further execute the program code of the following steps: acquiring continuous video frames containing a first product to obtain a video clip corresponding to the first product; and displaying the video clip.

The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: acquiring an image of a target object, wherein the image of the target object is acquired based on a shooting device associated with an image processing system; processing the image of the target object to at least obtain the target space characteristics of the image of the target object; matching the image of the target object with the images of other objects at least based on the target space characteristics to obtain a matching result, wherein the matching result is used for representing whether the attribute parameters of the other objects are the same as the attribute parameters of the target object; and displaying the matching result.

The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: acquiring an image of a target object; processing the image of the target object to at least obtain the target space characteristics of the image of the target object; intercepting the video data to obtain images of other objects contained in the video data; matching the image of the target object with the images of other objects at least based on the target space characteristics to obtain a matching result, wherein the matching result is used for representing whether the attribute parameters of the other objects are the same as the attribute parameters of the target object; and obtaining a video segment corresponding to the target object in the video data based on the matching result.

The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: receiving an image searching instruction in a video live broadcasting process; acquiring an image of a target object in a live video based on an image searching instruction; processing the image of the target object to at least obtain the target space characteristics of the image of the target object; matching the image of the target object with the images of other objects at least based on the target space characteristics to obtain a matching result, wherein the matching result is used for representing whether the attribute parameters of the other objects are the same as the attribute parameters of the target object; and displaying the matching result.

Optionally, the processor may further execute the program code of the following steps: acquiring a gesture image corresponding to the image searching instruction; recognizing the gesture image and determining gesture information in the gesture image; and determining a target object in the live video based on the gesture information.

Optionally, the processor may further execute the program code of the following steps: receiving an area designation instruction; determining a target area in the image of the target object based on the area specifying instruction; determining a mapping area corresponding to the target area in the matching result; and in the process of displaying the matching result, displaying the mapping area according to a preset mode.

By adopting the embodiment of the application, a scheme for commodity identification is provided. Because the granularity of the spatial features is smaller, commodity details are paid more attention to, commodity identification is carried out through the spatial features, the technical effects of improving commodity identification precision and improving user shopping experience are achieved, and the technical problem that matching precision is lower when target commodities are matched in the related technology is solved.

It can be understood by those skilled in the art that the structure shown in fig. 16 is only an illustration, and the computer terminal may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, a Mobile Internet Device (MID), a PAD, etc. Fig. 16 is a diagram illustrating a structure of the electronic device. For example, the computer terminal a may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in fig. 16, or have a different configuration than shown in fig. 16.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

Example 15

Embodiments of the present application also provide a storage medium. Optionally, in this embodiment, the storage medium may be configured to store program codes executed by the data processing method provided in the foregoing embodiment.

Optionally, in this embodiment, the storage medium may be located in any one of computer terminals in a computer terminal group in a computer network, or in any one of mobile terminals in a mobile terminal group.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: acquiring an image of a target object; processing the image of the target object to at least obtain the target space characteristics of the image of the target object; and matching the image of the target object with the images of other objects at least based on the target space characteristics to obtain a matching result, wherein the matching result is used for representing whether the attribute parameters of the other objects are the same as the attribute parameters of the target object.

Optionally, the storage medium is further configured to store program codes for performing the following steps: processing the image of the target object, and determining a target area where the target object is located; and processing the target area by using the feature expression network to obtain the target feature of the image of the target object, wherein the target feature comprises a target space feature.

Optionally, the storage medium is further configured to store program codes for performing the following steps: inputting the target area into a feature expression network to obtain the feature of a preset dimension; dividing the characteristics of the preset dimensionality according to a preset dividing mode; performing pooling operation on the divided features to obtain block features; combining the block features to obtain a spatial feature with a preset dimension; and inputting the spatial features with preset dimensionality into the full connection layer to obtain target features, wherein the dimensionality of the target features is smaller than the preset dimensionality.

Optionally, the storage medium is further configured to store program codes for performing the following steps: acquiring training data, wherein the training data comprises: a first sample and a second sample; processing the training data by using a feature expression network to obtain the features of a first sample and the features of a second sample, wherein the features of the first sample comprise the spatial features of the first sample, and the features of the second sample comprise the spatial features of the second sample; determining a target loss value of the feature expression network based on the features of the first sample and the features of the second sample; and updating the network weight of the feature expression network based on the target loss value.

Optionally, the storage medium is further configured to store program codes for performing the following steps: determining a classification result of the first sample based on the characteristics of the first sample, and determining a classification result of the second sample based on the characteristics of the second sample; determining a classification loss value based on the classification result of the first sample and the classification result of the second sample; obtaining a metric distance between the features of the first sample and the features of the second sample, and determining a metric loss value; and acquiring the weighted sum of the classification loss value and the measurement loss value to obtain a target loss value.

Optionally, the storage medium is further configured to store program codes for performing the following steps: processing images of other objects to obtain the characteristics of the images of the other objects, wherein the characteristics comprise the spatial characteristics of the images of the other objects; and matching the target characteristics with the characteristics of the images of other objects to obtain a matching result.

Optionally, the storage medium is further configured to store program codes for performing the following steps: and acquiring the measurement distance of the target characteristic and the images of other objects to obtain a matching result.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: receiving an image of a target object; processing the image of the target object to at least obtain the target space characteristics of the image of the target object; matching the image of the target object with the images of other objects at least based on the target space characteristics to obtain a matching result, wherein the matching result is used for representing whether the attribute parameters of the other objects are the same as the attribute parameters of the target object; and displaying the image of the matched object.

Optionally, the storage medium is further configured to store program codes for performing the following steps: receiving a selected target database; images of other objects stored in the target database are acquired.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: acquiring an uploaded short video with a first product displayed and an image of a second product; intercepting the short video to obtain a video frame containing a first product; processing the video frame to at least obtain the spatial characteristics of the video frame, and processing the image of the second product to at least obtain the spatial characteristics of the image; matching the video frame and the image at least based on the spatial features of the video frame and the spatial features of the image to obtain a matching result, wherein the matching result is used for representing whether the attribute parameters of other objects are the same as the attribute parameters of the target object; and displaying the matching result.

Optionally, the storage medium is further configured to store program codes for performing the following steps: acquiring continuous video frames containing a first product to obtain a video clip corresponding to the first product; and displaying the video clip.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: acquiring an image of a target object, wherein the image of the target object is acquired based on a shooting device associated with an image processing system; processing the image of the target object to at least obtain the target space characteristics of the image of the target object; matching the image of the target object with the images of other objects at least based on the target space characteristics to obtain a matching result, wherein the matching result is used for representing whether the attribute parameters of the other objects are the same as the attribute parameters of the target object; and displaying the matching result.

Optionally, the storage medium is further configured to store program codes for performing the following steps: acquiring a gesture image corresponding to the image searching instruction; recognizing the gesture image and determining gesture information in the gesture image; and determining a target object in the live video based on the gesture information.

Optionally, the storage medium is further configured to store program codes for performing the following steps: receiving an area designation instruction; determining a target area in the image of the target object based on the area specifying instruction; determining a mapping area corresponding to the target area in the matching result; and in the process of displaying the matching result, displaying the mapping area according to a preset mode.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims

1. A method of data processing, comprising:

acquiring an image of a target object;

processing the image of the target object to at least obtain a target space characteristic of the image of the target object;

and matching the image of the target object with the images of other objects at least based on the target spatial features to obtain a matching result, wherein the matching result is used for representing whether the attribute parameters of the other objects are the same as the attribute parameters of the target object.

2. The method of claim 1, wherein processing the image of the target object to obtain at least a target spatial feature of the image of the target object comprises:

processing the image of the target object, and determining a target area where the target object is located;

and processing the target area by using a feature expression network to obtain the target feature of the image of the target object, wherein the target feature comprises the target space feature.

3. The method of claim 2, wherein processing the target region using a feature expression network to obtain a target feature of the image of the target object comprises:

inputting the target area into the feature expression network to obtain a feature with a preset dimension;

dividing the characteristics of the preset dimensionality according to a preset dividing mode;

performing pooling operation on the divided features to obtain block features;

combining the block features to obtain the spatial features of the preset dimensionality;

and inputting the spatial feature of the preset dimension into a full connection layer to obtain the target feature, wherein the dimension of the target feature is smaller than the preset dimension.

4. The method of claim 2, wherein the method further comprises:

obtaining training data, wherein the training data comprises: a first sample and a second sample;

processing the training data by using the feature expression network to obtain features of the first sample and features of the second sample, wherein the features of the first sample comprise spatial features of the first sample, and the features of the second sample comprise spatial features of the second sample;

determining a target loss value of the feature expression network based on the features of the first sample and the features of the second sample;

updating the network weights of the feature expression network based on the target loss value.

5. The method of claim 4, wherein determining a target loss value for the feature expression network based on the features of the first sample and the features of the second sample comprises:

determining a classification result of the first sample based on the features of the first sample, and determining a classification result of the second sample based on the features of the second sample;

determining a classification loss value based on the classification result of the first sample and the classification result of the second sample;

obtaining a metric distance between the features of the first sample and the features of the second sample, and determining a metric loss value;

and obtaining the weighted sum of the classification loss value and the measurement loss value to obtain the target loss value.

6. The method of claim 2, wherein matching the image of the target object with images of other objects based on at least the target spatial features, the obtaining of the matching result comprises:

processing the images of the other objects to obtain features of the images of the other objects, wherein the features of the images of the other objects comprise spatial features of the images of the other objects;

and matching the target characteristics with the characteristics of the images of the other objects to obtain the matching result.

7. The method of claim 6, wherein matching the target feature with features of the images of the other objects, resulting in the matching result comprises:

and acquiring the measurement distance between the target characteristic and the characteristic of the image of the other object to obtain the matching result.

8. A method of data processing, comprising:

acquiring an image of a target object;

intercepting video data to obtain images of other objects contained in the video data;

matching the image of the target object with the images of the other objects at least based on the target spatial features to obtain a matching result, wherein the matching result is used for representing whether the attribute parameters of the other objects are the same as the attribute parameters of the target object;

and obtaining a video segment corresponding to the target object in the video data based on the matching result.

9. The method of claim 8, wherein processing the image of the target object to obtain at least a target spatial feature of the image of the target object comprises:

10. The method of claim 9, wherein processing the target region using a feature expression network to obtain a target feature of the image of the target object comprises:

performing pooling operation on the divided features to obtain block features;

11. The method of claim 8, wherein obtaining, based on the matching result, a video segment corresponding to the target object in the video data comprises:

under the condition that the matching result is that the attribute parameters of the other objects are the same as the attribute parameters of the target object, acquiring a plurality of video frames containing the other objects in the video data;

and obtaining a video clip corresponding to the target object based on the time information of the plurality of video frames.

12. A method of data processing, comprising:

receiving an image searching instruction in a video live broadcasting process;

acquiring an image of a target object in a live video based on the image searching instruction;

matching the image of the target object with images of other objects at least based on the target spatial features to obtain a matching result, wherein the matching result is used for representing whether the attribute parameters of the other objects are the same as the attribute parameters of the target object;

and displaying the matching result.

13. The method of claim 12, wherein based on the image search instruction, acquiring an image of a target object corresponding to the image search instruction comprises:

acquiring a gesture image corresponding to the image searching instruction;

recognizing the gesture image and determining gesture information in the gesture image;

determining the target object in the live video based on the gesture information.

14. The method of claim 12, wherein prior to presenting the matching result, the method further comprises:

receiving an area designation instruction;

determining a target region in the image of the target object based on the region specifying instruction;

determining a mapping area corresponding to the target area in the matching result;

and in the process of displaying the matching result, displaying the mapping area according to a preset mode.

15. The method of claim 12, wherein processing the image of the target object to obtain at least a target spatial feature of the image of the target object comprises:

16. The method of claim 15, wherein processing the target region using a feature expression network to obtain a target feature of the image of the target object comprises:

performing pooling operation on the divided features to obtain block features;

17. A method of data processing, comprising:

receiving an image of a target object;

and displaying the matching result.

18. The method of claim 17, wherein the method further comprises:

receiving a selected target database;

acquiring images of the other objects stored in the target database.

19. The method of claim 17, wherein processing the image of the target object to obtain at least a target spatial feature of the image of the target object comprises:

20. The method of claim 19, wherein processing the target region using a feature expression network to obtain a target feature of the image of the target object comprises:

performing pooling operation on the divided features to obtain block features;

21. A method of data processing, comprising:

acquiring an uploaded short video with a first product displayed and an image of a second product;

intercepting the short video to obtain a video frame containing the first product;

processing the video frame to at least obtain the spatial characteristics of the video frame, and processing the image of the second product to at least obtain the spatial characteristics of the image;

matching the video frame and the image at least based on the spatial features of the video frame and the spatial features of the image to obtain a matching result, wherein the matching result is used for representing whether the attribute parameters of the first product are the same as the attribute parameters of the second product;

and displaying the matching result.

22. The method of claim 21, wherein in the case that the matching result is that the attribute parameters of the first product and the second product are the same, the method further comprises:

acquiring continuous video frames containing the first product to obtain a video clip corresponding to the first product;

and displaying the video clip.

23. The method of claim 21, wherein processing the video frame to obtain at least spatial features of the video frame and processing the image of the second product to obtain at least spatial features of the image comprises:

processing the video frame and the image, and determining the area where the first product is located and the area where the second product is located;

and respectively processing the area where the first product is located and the area where the second product is located by utilizing a feature expression network to obtain the features of the video frame and the features of the image, wherein the features of the video frame comprise the spatial features of the video frame, and the features of the image comprise the spatial features of the image.

24. The method of claim 23, wherein the processing the area where the first product is located and the area where the second product is located by using a feature expression network to obtain the features of the video frame and the features of the image comprises:

inputting the area where the first product is located and the area where the second product is located into the feature expression network to obtain a video frame feature with a preset dimension and an image feature with the preset dimension;

dividing the video frame characteristics of the preset dimensionality and the image characteristics of the preset dimensionality according to a preset dividing mode;

performing pooling operation on the divided video frame characteristics and image characteristics to obtain video frame block characteristics and image block characteristics;

merging the video frame blocking features to obtain the video frame spatial features of the preset dimensionality, and merging the image blocking features to obtain the image spatial features of the preset dimensionality;

inputting the video frame spatial features and the image spatial features to a full connection layer to obtain the features of the video frames and the features of the images, wherein the dimensions of the features of the video frames and the dimensions of the features of the images are both smaller than the preset dimensions.

25. A method of data processing, comprising:

acquiring an image of a target object, wherein the image of the target object is acquired based on a shooting device associated with an image processing system;

and displaying the matching result.

26. The method of claim 25, wherein the image processing system is disposed on a robot, wherein the robot is configured to perform voice interaction.

27. The method of claim 25, wherein processing the image of the target object to obtain at least a target spatial feature of the image of the target object comprises:

28. The method of claim 27, wherein processing the target region using a feature expression network to obtain a target feature of the image of the target object comprises:

performing pooling operation on the divided features to obtain block features;

29. A computer-readable storage medium comprising a stored program, wherein the program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform the data processing method of any one of claims 1 to 28.

30. A processing device, comprising: a memory and a processor for executing a program stored in the memory, wherein the program when executed performs the data processing method of any one of claims 1 to 28.

31. A data processing system comprising:

a processor; and

a memory coupled to the processor for providing instructions to the processor for processing the following processing steps: acquiring an image of a target object; processing the image of the target object to at least obtain a target space characteristic of the image of the target object; and matching the image of the target object with the images of other objects at least based on the target spatial features to obtain a matching result, wherein the matching result is used for representing whether the attribute parameters of the other objects are the same as the attribute parameters of the target object.