CN114898349A

CN114898349A - Target commodity identification method and device, equipment, medium and product thereof

Info

Publication number: CN114898349A
Application number: CN202210580467.2A
Authority: CN
Inventors: 李保俊
Original assignee: Guangzhou Huanju Shidai Information Technology Co Ltd
Current assignee: Guangzhou Huanju Shidai Information Technology Co Ltd
Priority date: 2022-05-25
Filing date: 2022-05-25
Publication date: 2022-08-12

Abstract

The application discloses a target commodity identification method and a device, equipment, a medium and a product thereof, wherein the method comprises the following steps: acquiring a commodity title and a commodity picture in commodity information of a target commodity; extracting deep semantic information of the commodity picture and the commodity title; fusing the deep semantic information of the commodity title to the deep semantic information of the commodity picture so as to highlight the image characteristics of the target commodity in the deep semantic information of the commodity picture according to the deep semantic information of the commodity title and obtain image-text fusion characteristic information; and inputting the image-text fusion characteristic information into a target detection model trained to be convergent in advance, and identifying the target commodity. The method and the device can accurately identify the target commodity in the commodity picture.

Description

Target commodity identification method and its device, equipment, medium and product

技术领域technical field

本申请涉及电商信息技术领域，尤其涉及一种目标商品识别方法及其相应的装置、计算机设备、计算机可读存储介质，以及计算机程序产品。The present application relates to the field of e-commerce information technology, and in particular, to a target commodity identification method and its corresponding device, computer equipment, computer-readable storage medium, and computer program product.

背景技术Background technique

电商平台中卖家用户为了衬托上架商品的效果，通常为上架商品配套其他搭配物品，上架商品的商品图片中便不仅包含上架商品还包含其他搭配物品。例如上架商品是衣服，其他搭配物品可能是裤子、帽子、鞋子等；上架商品是置物架，其他搭配物品可能是家居电器、书本、摆件等。因而无法单纯通过商品图片确定对应的上架商品。In order to set off the effect of the products on the shelves, sellers and users on the e-commerce platform usually match the products on the shelves with other matching items. The product pictures of the products on the shelves include not only the products on the shelves but also other matching items. For example, the item on the shelf is clothes, and other matching items may be trousers, hats, shoes, etc.; the item on the shelf is a shelf, and other matching items may be household appliances, books, ornaments, etc. Therefore, it is not possible to determine the corresponding product on the shelf simply through the product image.

电商平台中常需从包含目标商品的商品图片中识别出该目标商品相对应的图像，用于实现其他下游任务，例如实现商品相似匹配、商品图像展示等。如果无法快速从商品图片中获取对应的目标商品的图像，将影响实现相关下游任务的效率，可能导致无法实现相关功能，导致用户体验下降。In the e-commerce platform, it is often necessary to identify the corresponding image of the target product from the product image containing the target product, which is used to achieve other downstream tasks, such as achieving product similarity matching, product image display, etc. If the image of the corresponding target product cannot be quickly obtained from the product image, the efficiency of implementing related downstream tasks will be affected, and the related functions may not be implemented, resulting in decreased user experience.

传统的解决方案，常采用多目标识别方法，从商品图片中识别出各个物品，其中包括所述的上架商品及其搭配物品，然后根据各个物品进行分类预测，确定出其中属于所述商品相对象的图像。这种方式需要经过两阶段处理，两个阶段分别需要采用不同的模型来实施，且不同模型均需对图像进行图像预处理之类的操作，过程繁琐，效率相对较低，更为麻烦的是两阶段的相关模型一般需要分别采用对应的数据集进行训练，训练成本较高。Traditional solutions often use multi-target recognition methods to identify each item from the product image, including the listed product and its matching items, and then classify and predict according to each item to determine which object belongs to the product. Image. This method requires two-stage processing. The two stages need to be implemented with different models, and different models need to perform image preprocessing and other operations on the image. The process is cumbersome, the efficiency is relatively low, and the more troublesome is The two-stage related models generally need to be trained with corresponding data sets respectively, and the training cost is high.

有鉴于此，本申请人试图探索可以从包含目标商品的商品图片中快速识别出该目标商品的其他方式。In view of this, the applicant attempts to explore other ways to quickly identify the target product from the product images containing the target product.

发明内容SUMMARY OF THE INVENTION

本申请的首要目的在于解决上述问题至少之一而提供一种目标商品识别方法及其相应的装置、计算机设备、计算机可读存储介质、计算机程序产品。The primary purpose of the present application is to solve at least one of the above problems and provide a target commodity identification method and its corresponding apparatus, computer equipment, computer-readable storage medium, and computer program product.

为满足本申请的各个目的，本申请采用如下技术方案：In order to meet the various purposes of the application, the application adopts the following technical solutions:

适应本申请的目的之一而提供的一种目标商品识别方法，包括如下步骤：A target commodity identification method provided in accordance with one of the purposes of this application includes the following steps:

获取目标商品的商品信息中的商品标题及商品图片；Obtain the product title and product image in the product information of the target product;

提取所述商品图片和商品标题的深层语义信息；Extract the deep semantic information of the product image and product title;

将所述商品标题的深层语义信息融合至所述商品图片的深层语义信息，以根据所述商品标题的深层语义信息突显出所述目标商品在所述商品图片的深层语义信息中的图像特征，获得图文融合特征信息；Integrating the deep semantic information of the product title with the deep semantic information of the product image, so as to highlight the image features of the target product in the deep semantic information of the product image according to the deep semantic information of the product title, Obtain image-text fusion feature information;

将所述图文融合特征信息输入至预先训练至收敛的目标检测模型，识别出所述目标商品。The image-text fusion feature information is input into the target detection model pre-trained to convergence, and the target product is identified.

进一步的实施例中，提取所述商品图片和商品标题的深层语义信息的步骤中，包括如下步骤：In a further embodiment, the step of extracting the deep semantic information of the product picture and the product title includes the following steps:

预处理所述商品图片，将预处理后的商品图片输入至预先训练至收敛的图像特征提取模型，获得相应的深层语义信息，用于表征所述商品图片的图像特征；Preprocessing the product picture, and inputting the preprocessed product picture into an image feature extraction model trained in advance to converge to obtain corresponding deep semantic information, which is used to characterize the image features of the product picture;

预处理所述商品标题，将预处理后的商品标题输入至预先训练至收敛的文本特征提取模型，获得相应的深层语义信息，用于表征所述商品标题的文本特征。The product title is preprocessed, and the preprocessed product title is input into a text feature extraction model that has been pre-trained to convergence to obtain corresponding deep semantic information, which is used to characterize the text feature of the product title.

深化的实施例中，预处理所述商品标题的步骤中，包括如下步骤：In a further embodiment, the step of preprocessing the product title includes the following steps:

过滤所述商品标题中的无效字符；filter invalid characters in the title of the product;

对过滤后的商品标题进行分词，获得其中的关键词，所述关键词包括目标商品的产品词和/或品牌词，完成对所述商品标题的预处理。Perform word segmentation on the filtered product title to obtain keywords therein, where the keywords include product words and/or brand words of the target product, and complete the preprocessing of the product title.

进一步的实施例中，将所述商品标题的深层语义信息融合至所述商品图片的深层语义信息，以根据所述商品标题的深层语义信息突显出所述目标商品在所述商品图片的深层语义信息中的图像特征，获得图文融合特征信息的步骤中，包括如下步骤：In a further embodiment, the deep semantic information of the product title is fused to the deep semantic information of the product image, so as to highlight the deep semantic information of the target product in the product image according to the deep semantic information of the product title. Image features in the information, the steps of obtaining image-text fusion feature information include the following steps:

采用多模态特征交互融合模块融合所述商品标题的深层语义信息和所述商品图片的深层语义信息，获得初步融合特征信息，所述初步融合特征信息中显著表征所述目标商品的图像的特征；A multi-modal feature interaction fusion module is used to fuse the deep semantic information of the product title and the deep semantic information of the product image to obtain preliminary fusion feature information, in which the features of the image of the target product are significantly represented in the preliminary fusion feature information. ;

将所述初步融合特征信息与所述商品图片的深层语义信息结合，获得图文融合特征信息；combining the preliminary fusion feature information with the deep semantic information of the commodity image to obtain image-text fusion feature information;

较佳的实施例中，采用多模态特征交互融合模块融合所述商品标题的深层语义信息和所述商品图片的深层语义信息，获得初步融合特征信息的步骤中，包括如下步骤：In a preferred embodiment, a multi-modal feature interaction fusion module is used to fuse the deep semantic information of the product title and the deep semantic information of the product image, and the step of obtaining preliminary fusion feature information includes the following steps:

以所述商品图片的深层语义信息构造查询向量，以所述商品标题的深层语义信息构造键向量和值向量，输入注意力层；Construct a query vector with the deep semantic information of the commodity picture, construct a key vector and a value vector with the deep semantic information of the commodity title, and input the attention layer;

由所述注意力层将所述查询向量与所述键向量进行交互并归一化，获得权重矩阵；Interacting and normalizing the query vector and the key vector by the attention layer to obtain a weight matrix;

由所述注意力层将所述值向量匹配所述权重矩阵获得初步融合特征信息。Preliminary fusion feature information is obtained by matching the value vector with the weight matrix by the attention layer.

进一步的实施例中，将所述图文融合特征信息输入至预先训练至收敛的目标检测模型，识别出所述目标商品的步骤中，包括如下步骤：In a further embodiment, the image-text fusion feature information is input into a pre-trained target detection model to converge, and the step of identifying the target product includes the following steps:

采用预先训练至收敛的目标检测模型根据所述图文融合特征信息检测所述商品图片中的目标商品，获得相应的检测区域；Use a pre-trained target detection model to converge to detect the target product in the product image according to the image-text fusion feature information, and obtain a corresponding detection area;

求取包围所述检测区域的最小面积的矩形框，以其框选出目标商品作为识别结果。A rectangular frame with the smallest area surrounding the detection area is obtained, and the target commodity is selected from the frame as the recognition result.

扩展的实施例中，将所述图文融合特征信息输入至预先训练至收敛的目标检测模型，识别出所述目标商品的步骤之后，还包括如下步骤：In an extended embodiment, the image-text fusion feature information is input into a pre-trained target detection model that has converged, and after the step of identifying the target product, the following steps are further included:

根据所述框选出目标商品的矩形框从所述商品图片中截取出目标商品的图像，将其关联目标商品的唯一标识码存储于商品数据库；Cut out the image of the target product from the product picture according to the rectangular frame of the selected target product, and store the unique identification code associated with the target product in the product database;

响应商品推荐请求，根据目标商品的唯一标识码检索商品数据库获取目标商品的图像，匹配与其相似的推荐商品；In response to the product recommendation request, search the product database according to the unique identification code of the target product to obtain the image of the target product, and match the recommended products similar to it;

应答所述商品推荐请求，推送所述推荐商品。Respond to the product recommendation request, and push the recommended product.

适应本申请的目的之一而提供的一种目标商品识别装置，包括：图文获取模块、语义提取模块、特征融合模块以及目标识别模块，其中，图文获取模块，用于获取目标商品的商品信息中的商品标题及商品图片；语义提取模块，用于提取所述商品图片和商品标题的深层语义信息；特征融合模块，用于将所述商品标题的深层语义信息融合至所述商品图片的深层语义信息，以根据所述商品标题的深层语义信息突显出所述目标商品在所述商品图片的深层语义信息中的图像特征，获得图文融合特征信息；目标识别模块，用于将所述图文融合特征信息输入至预先训练至收敛的目标检测模型，识别出所述目标商品。A target commodity identification device provided in accordance with one of the purposes of this application includes: a graphic and text acquisition module, a semantic extraction module, a feature fusion module and a target identification module, wherein the graphic and text acquisition module is used for acquiring the commodities of the target commodity. The product title and product image in the information; the semantic extraction module is used to extract the deep semantic information of the product image and the product title; the feature fusion module is used to fuse the deep semantic information of the product title into the product image. deep semantic information, to highlight the image features of the target product in the deep semantic information of the product picture according to the deep semantic information of the product title, and obtain image-text fusion feature information; the target recognition module is used to The image-text fusion feature information is input into the pre-trained to convergent target detection model, and the target product is identified.

进一步的实施例中，所述语义提取模块，包括：图像特征提取子模块，用于预处理所述商品图片，将预处理后的商品图片输入至预先训练至收敛的图像特征提取模型，获得相应的深层语义信息，用于表征所述商品图片的图像特征；文本特征提取子模块，用于预处理所述商品标题，将预处理后的商品标题输入至预先训练至收敛的文本特征提取模型，获得相应的深层语义信息，用于表征所述商品标题的文本特征。In a further embodiment, the semantic extraction module includes: an image feature extraction sub-module, configured to preprocess the commodity picture, input the preprocessed commodity picture into the image feature extraction model that has been pre-trained to convergence, and obtain the corresponding The deep semantic information is used to characterize the image features of the product picture; the text feature extraction sub-module is used to preprocess the product title, and input the preprocessed product title into the text feature extraction model that has been pre-trained to convergence, Corresponding deep semantic information is obtained, which is used to characterize the textual features of the product title.

深化的实施例中，所述图像特征提取子模块，包括：字符过滤单元，用于过滤所述商品标题中的无效字符；文本分词单元，用于对过滤后的商品标题进行分词，获得其中的关键词，所述关键词包括目标商品的产品词和/或品牌词，完成对所述商品标题的预处理。In a further embodiment, the image feature extraction sub-module includes: a character filtering unit for filtering invalid characters in the commodity title; a text segmentation unit for performing word segmentation on the filtered commodity title, and obtaining the Keywords, where the keywords include product words and/or brand words of the target commodity, to complete the preprocessing of the commodity title.

进一步的实施例中，所述特征融合模块，包括：语义融合子模块，用于采用多模态特征交互融合模块融合所述商品标题的深层语义信息和所述商品图片的深层语义信息，获得初步融合特征信息，所述初步融合特征信息中显著表征所述目标商品的图像的特征；信息结合子模块，用于将所述初步融合特征信息与所述商品图片的深层语义信息结合，获得图文融合特征信息；In a further embodiment, the feature fusion module includes: a semantic fusion sub-module, configured to use a multimodal feature interaction fusion module to fuse the deep semantic information of the product title and the deep semantic information of the product image to obtain a preliminary Fusion feature information, in the preliminary fusion feature information, features that significantly characterize the image of the target product; an information combination sub-module for combining the preliminary fusion feature information with the deep semantic information of the product image to obtain graphic and textual information fusion feature information;

较佳的实施例中，所述语义融合子模块，包括：向量输入单元，用于以所述商品图片的深层语义信息构造查询向量，以所述商品标题的深层语义信息构造键向量和值向量，输入注意力层；权重提取单元，用于由所述注意力层将所述查询向量与所述键向量进行交互并归一化，获得权重矩阵；特征生成单元，用于由所述注意力层将所述值向量匹配所述权重矩阵获得初步融合特征信息。In a preferred embodiment, the semantic fusion sub-module includes: a vector input unit for constructing a query vector with the deep semantic information of the product image, and constructing a key vector and a value vector with the deep semantic information of the product title. , input attention layer; weight extraction unit, used for interacting and normalizing the query vector and the key vector by the attention layer to obtain a weight matrix; feature generation unit, used by the attention layer The layer matches the value vector to the weight matrix to obtain preliminary fusion feature information.

进一步的实施例中，所述目标识别模块，包括：目标检测单元，用于采用预先训练至收敛的目标检测模型根据所述图文融合特征信息检测所述商品图片中的目标商品，获得相应的检测区域；框选识别单元，用于求取包围所述检测区域的最小面积的矩形框，以其框选出目标商品作为识别结果。In a further embodiment, the target identification module includes: a target detection unit, configured to detect the target commodity in the commodity picture according to the image-text fusion feature information by using a pre-trained target detection model to converge, and obtain the corresponding target commodity. A detection area; a frame selection identification unit, used to obtain a rectangular frame with a minimum area surrounding the detection area, and select a target commodity by its frame as a recognition result.

扩展的实施例中，所述目标识别模块之后，还包括：截取存储模块，用于根据所述框选出目标商品的矩形框从所述商品图片中截取出目标商品的图像，将其关联目标商品的唯一标识码存储于商品数据库；响应请求模块，用于响应商品推荐请求，根据目标商品的唯一标识码检索商品数据库获取目标商品的图像，匹配与其相似的推荐商品；应答请求模块，用于应答所述商品推荐请求，推送所述推荐商品。In an extended embodiment, after the target identification module, it further includes: an interception storage module, configured to intercept the image of the target commodity from the commodity picture according to the rectangular frame for selecting the target commodity, and associate it with the target commodity. The unique identification code of the commodity is stored in the commodity database; the response request module is used to respond to the commodity recommendation request, retrieve the commodity database according to the unique identification code of the target commodity to obtain the image of the target commodity, and match the recommended commodities similar to it; the response request module is used for Respond to the product recommendation request, and push the recommended product.

本申请的技术方案存在多方面优势，包括但不限于如下各方面：The technical solution of the present application has many advantages, including but not limited to the following aspects:

首先，本申请利用商品标题的深层语义信息突显出目标商品在商品图片的深层语义信息中的图像特征，提供指示商品图片中目标商品的关键信息，使得根据该关键信息即可进行针对单个对象的识别，快速精准地从商品图片中识别目标商品；First, the present application uses the deep semantic information of the product title to highlight the image features of the target product in the deep semantic information of the product image, and provides key information indicating the target product in the product image, so that a single object can be identified based on the key information. Identify, quickly and accurately identify the target product from the product image;

其次，采用多模态特征融合，融合商品标题的深层语义信息和商品图片的深层语义信息，进而根据融合特征便可识别出目标商品，可以简化用于实现识别的模型架构及其训练步骤，即是采用一个数据集即可一并训练相关模型，训练成本较低，训练过程相对简便，而且，可以理解，本申请实现的目标商品识别通过单个阶段即可实现识别，其执行效率相对较高。Secondly, multi-modal feature fusion is used to fuse the deep semantic information of the product title and the deep semantic information of the product image, and then the target product can be identified according to the fusion features, which can simplify the model architecture and training steps used to realize the recognition, that is, The related models can be trained together by using a single data set, the training cost is low, and the training process is relatively simple. Moreover, it can be understood that the target commodity recognition realized by this application can be recognized in a single stage, and its execution efficiency is relatively high.

此外，本申请实现的目标商品识别功能，可应用于电商平台中相关的多种下游任务，例如商品相似匹配、商品分类、商品标签等等，而且，可以理解，由于可以针对性提取目标商品的图像，有助于准确提供下游任务所需的图像。In addition, the target commodity identification function realized by the present application can be applied to various downstream tasks related to the e-commerce platform, such as commodity similarity matching, commodity classification, commodity labeling, etc., and it can be understood that since the target commodity can be extracted in a targeted manner , which helps provide exactly the images needed for downstream tasks.

附图说明Description of drawings

本申请上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解，其中：The above and/or additional aspects and advantages of the present application will become apparent and readily understood from the following description of embodiments taken in conjunction with the accompanying drawings, wherein:

图1为本申请的目标商品识别方法的典型实施例的流程示意图；FIG. 1 is a schematic flowchart of a typical embodiment of the target commodity identification method of the present application;

图2为本申请实施例中多模态特征交互模块的示意图；2 is a schematic diagram of a multimodal feature interaction module in an embodiment of the present application;

图3为本申请实施例中所采用的目标商品识别模型的实现架构示意图；3 is a schematic diagram of the implementation architecture of the target commodity identification model adopted in the embodiment of the application;

图4为本申请实施例中提取深层语义信息的流程示意图；4 is a schematic flowchart of extracting deep semantic information in an embodiment of the present application;

图5为本申请实施例中商品标题的一种预处理过程的流程示意图；5 is a schematic flowchart of a preprocessing process of a commodity title in the embodiment of the application;

图6为本申请实施例中获得图文融合特征信息的流程示意图；6 is a schematic flowchart of obtaining feature information of image-text fusion in an embodiment of the application;

图7为本申请实施例中获得初步图文融合特征信息的流程示意图；7 is a schematic flowchart of obtaining preliminary image-text fusion feature information in an embodiment of the application;

图8为本申请实施例中从商品图片中识别目标商品的流程示意图；8 is a schematic flowchart of identifying a target product from a product image in an embodiment of the application;

图9为本申请实施例中商品推荐的流程示意图；FIG. 9 is a schematic flowchart of commodity recommendation in the embodiment of the application;

图10为本申请的目标商品识别装置的原理框图；10 is a schematic block diagram of the target commodity identification device of the application;

图11为本申请所采用的一种计算机设备的结构示意图。FIG. 11 is a schematic structural diagram of a computer device used in this application.

具体实现方式specific implementation

下面详细描述本申请的实施例，所述实施例的示例在附图中示出，其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的，仅用于解释本申请，而不能解释为对本申请的限制。The following describes in detail the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and are only used to explain the present application, but not to be construed as a limitation on the present application.

本技术领域技术人员可以理解，除非特意声明，这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。应该进一步理解的是，本申请的说明书中使用的措辞“包括”是指存在所述特征、整数、步骤、操作、元件和/或组件，但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、组件和/或它们的组。应该理解，当我们称元件被“连接”或“耦接”到另一元件时，它可以直接连接或耦接到其他元件，或者也可以存在中间元件。此外，这里使用的“连接”或“耦接”可以包括无线连接或无线耦接。这里使用的措辞“和/或”包括一个或更多个相关联的列出项的全部或任一单元和全部组合。It will be understood by those skilled in the art that the singular forms "a", "an", "the" and "the" as used herein can include the plural forms as well, unless expressly stated otherwise. It should be further understood that the word "comprising" used in the specification of this application refers to the presence of stated features, integers, steps, operations, elements and/or components, but does not preclude the presence or addition of one or more other features, Integers, steps, operations, elements, components and/or groups thereof. It will be understood that when we refer to an element as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Furthermore, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combination of one or more of the associated listed items.

本技术领域技术人员可以理解，除非另外定义，这里使用的所有术语(包括技术术语和科学术语)，具有与本申请所属领域中的普通技术人员的一般理解相同的意义。还应该理解的是，诸如通用字典中定义的那些术语，应该被理解为具有与现有技术的上下文中的意义一致的意义，并且除非像这里一样被特定定义，否则不会用理想化或过于正式的含义来解释。It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It should also be understood that terms, such as those defined in a general dictionary, should be understood to have meanings consistent with their meanings in the context of the prior art and, unless specifically defined as herein, should not be interpreted in idealistic or overly formal meaning to explain.

本技术领域技术人员可以理解，这里所使用的“客户端”、“终端”、“终端设备”既包括无线信号接收器的设备，其仅具备无发射能力的无线信号接收器的设备，又包括接收和发射硬件的设备，其具有能够在双向通信链路上，进行双向通信的接收和发射硬件的设备。这种设备可以包括：蜂窝或其他诸如个人计算机、平板电脑之类的通信设备，其具有单线路显示器或多线路显示器或没有多线路显示器的蜂窝或其他通信设备；PCS(PersonalCommunications Service，个人通信系统)，其可以组合语音、数据处理、传真和/或数据通信能力；PDA(Personal Digital Assistant，个人数字助理)，其可以包括射频接收器、寻呼机、互联网/内联网访问、网络浏览器、记事本、日历和/或GPS(Global PositioningSystem，全球定位系统)接收器；常规膝上型和/或掌上型计算机或其他设备，其具有和/或包括射频接收器的常规膝上型和/或掌上型计算机或其他设备。这里所使用的“客户端”、“终端”、“终端设备”可以是便携式、可运输、安装在交通工具(航空、海运和/或陆地)中的，或者适合于和/或配置为在本地运行，和/或以分布形式，运行在地球和/或空间的任何其他位置运行。这里所使用的“客户端”、“终端”、“终端设备”还可以是通信终端、上网终端、音乐/视频播放终端，例如可以是PDA、MID(Mobile Internet Device，移动互联网设备)和/或具有音乐/视频播放功能的移动电话，也可以是智能电视、机顶盒等设备。Those skilled in the art can understand that the "client", "terminal" and "terminal device" used herein include both a wireless signal receiver device that only has a wireless signal receiver without transmission capability, and a wireless signal receiver device. A device with receive and transmit hardware that has receive and transmit hardware capable of two-way communication over a two-way communication link. Such devices may include: cellular or other communication devices such as personal computers, tablet computers, which have a single-line display or a multi-line display or a cellular or other communication device without a multi-line display; PCS (Personal Communications Service, Personal Communications System) ), which can combine voice, data processing, fax and/or data communication capabilities; PDA (Personal Digital Assistant), which can include radio frequency receivers, pagers, Internet/Intranet access, web browsers, notepads , calendar and/or GPS (Global Positioning System) receivers; conventional laptop and/or palmtop computers or other devices having and/or conventional laptop and/or palmtop radio frequency receivers computer or other device. As used herein, "client", "terminal", "terminal device" may be portable, transportable, mounted in a vehicle (air, marine and/or land), or adapted and/or configured to be locally operate, and/or in distributed form, operate at any other location on Earth and/or space. The "client", "terminal" and "terminal device" used here can also be a communication terminal, an Internet terminal, and a music/video playing terminal, such as a PDA, MID (Mobile Internet Device) and/or A mobile phone with music/video playback function, or a smart TV, set-top box, etc.

本申请所称的“服务器”、“客户端”、“服务节点”等名称所指向的硬件，本质上是具备个人计算机等效能力的电子设备，为具有中央处理器(包括运算器和控制器)、存储器、输入设备以及输出设备等冯诺依曼原理所揭示的必要构件的硬件装置，计算机程序存储于其存储器中，中央处理器将存储在外存中的程序调入内存中运行，执行程序中的指令，与输入输出设备交互，借此完成特定的功能。The hardware referred to by names such as "server", "client" and "service node" in this application is essentially an electronic device with the equivalent capability of a personal computer, which is a central processing unit (including an arithmetic unit and a controller). ), memory, input device and output device and other necessary components disclosed by the Von Neumann principle, the computer program is stored in its memory, and the central processing unit transfers the program stored in the external memory into the memory to run, and executes the program. The instructions in the interface interact with input and output devices to complete specific functions.

需要指出的是，本申请所称的“服务器”这一概念，同理也可扩展到适用于服务器机群的情况。依据本领域技术人员所理解的网络部署原理，所述各服务器应是逻辑上的划分，在物理空间上，这些服务器既可以是互相独立但可通过接口调用的，也可以是集成到一台物理计算机或一套计算机机群的。本领域技术人员应当理解这一变通，而不应以此约束本申请的网络部署方式的实施方式。It should be pointed out that the concept of "server" referred to in this application can also be extended to the case of server clusters in the same way. According to the principles of network deployment understood by those skilled in the art, the servers should be logically divided. In physical space, these servers can be independent from each other but can be called through interfaces, or can be integrated into a physical server. A computer or a group of computers. Those skilled in the art should understand this modification, but should not limit the implementation of the network deployment manner of the present application.

本申请的一个或数个技术特征，除非明文指定，既可部署于服务器实施而由客户端远程调用获取服务器提供的在线服务接口来实施访问，也可直接部署并运行于客户端来实施访问。Unless explicitly specified, one or more technical features of the present application can be deployed on the server and remotely invoked by the client to obtain the online service interface provided by the server to implement access, or can be directly deployed and run on the client to implement access.

本申请中所引用或可能引用到的神经网络模型，除非明文指定，既可部署于远程服务器且在客户端实施远程调用，也可部署于设备能力胜任的客户端直接调用，某些实施例中，当其运行于客户端时，其相应的智能可通过迁移学习来获得，以便降低对客户端硬件运行资源的要求，避免过度占用客户端硬件运行资源。The neural network model cited or possibly cited in this application, unless specified in plain text, can either be deployed on a remote server and invoked remotely on the client, or deployed on a client with competent device capabilities to directly invoke, in some embodiments , when it runs on the client, its corresponding intelligence can be obtained through transfer learning, so as to reduce the requirements on the client hardware running resources and avoid excessively occupying the client hardware running resources.

本申请所涉及的各种数据，除非明文指定，既可远程存储于服务器，也可存储于本地终端设备，只要其适于被本申请的技术方案所调用即可。All kinds of data involved in this application, unless specified in plain text, can be stored in a server remotely or in a local terminal device, as long as it is suitable for being called by the technical solution of this application.

本领域技术人员对此应当知晓：本申请的各种方法，虽然基于相同的概念而进行描述而使其彼此间呈现共通性，但是，除非特别说明，否则这些方法都是可以独立执行的。同理，对于本申请所揭示的各个实施例而言，均基于同一发明构思而提出，因此，对于相同表述的概念，以及尽管概念表述不同但仅是为了方便而适当变换的概念，应被等同理解。Those skilled in the art should know that: although the various methods of the present application are described based on the same concept to show commonality with each other, unless otherwise specified, these methods can be independently executed. Similarly, for the various embodiments disclosed in this application, they are all proposed based on the same inventive concept. Therefore, the concepts expressed in the same way, and the concepts that are appropriately transformed for convenience even though the concept expressions are different, should be regarded as equivalent. understand.

本申请即将揭示的各个实施例，除非明文指出彼此之间的相互排斥关系，否则，各个实施例所涉的相关技术特征可以交叉结合而灵活构造出新的实施例，只要这种结合不背离本申请的创造精神且可满足现有技术中的需求或解决现有技术中的某方面的不足即可。对此变通，本领域技术人员应当知晓。In the various embodiments to be disclosed in this application, unless the mutually exclusive relationship between each other is clearly indicated, the related technical features involved in the various embodiments can be cross-combined to flexibly construct new embodiments, as long as the combination does not deviate from the present invention. The creative spirit of the application can meet the needs in the prior art or solve a certain aspect of the deficiencies in the prior art. Variations on this will be known to those skilled in the art.

本申请的一种目标商品识别方法，可被编程为计算机程序产品，部署于服务器中运行而实现，例如在本申请的电商平台应用场景中，一般部署在服务器中实施，藉此可以通过访问该计算机程序产品运行后开放的接口，通过图形用户界面与该计算机程序产品的进程进行人机交互而执行该方法。A target commodity identification method of the present application can be programmed as a computer program product and implemented by being deployed in a server. The interface opened after the computer program product runs, and the method is executed by man-machine interaction with the process of the computer program product through a graphical user interface.

本申请采用本申请实现的目标识别模型以实现目标商品识别的功能，所述目标识别模型为集成模型，其中包含用于提取商品图片对应的深层语义信息的神经网络模型、用于提取商品标题对应的深层语义信息的神经网络模型、用于识别商品图像中的目标商品的神经网络模型。This application adopts the target recognition model implemented in this application to realize the function of target commodity recognition. The target recognition model is an integrated model, which includes a neural network model for extracting deep semantic information corresponding to commodity pictures, and a neural network model for extracting commodity titles corresponding to A neural network model of deep semantic information, a neural network model for identifying target commodities in commodity images.

请参阅图1，本申请的目标商品识别方法在其典型实施例中，包括如下步骤：Referring to FIG. 1, the target commodity identification method of the present application, in its typical embodiment, includes the following steps:

步骤S1100、获取目标商品的商品信息中的商品标题及商品图片；Step S1100, obtaining the product title and product picture in the product information of the target product;

在电商平台的应用场景中，可以每一个商品作为一个相对独立的单个信息单元进行处理，由电商平台的线上店铺的商家用户负责发布、维护以及更新，且可提供给消费者用户进行浏览和下单等。所述的线上店铺可以是独立站点，独立站点独立维护自身线上店铺的商品的商品数据库，可以通过安装本申请实现的计算机程序产品来对商品图片中的目标商品做出识别。每个商品均有其对应的商品信息用于描述该商品，所述商品信息通常包含商品标题及商品图片。In the application scenario of the e-commerce platform, each commodity can be processed as a relatively independent single information unit, and the merchant users of the online store of the e-commerce platform are responsible for publishing, maintaining and updating, and can be provided to consumer users for Browse and place orders, etc. The online store can be an independent site, which independently maintains a commodity database of commodities in its own online store, and can identify the target commodity in the commodity picture by installing the computer program product realized by the present application. Each product has its corresponding product information for describing the product, and the product information usually includes a product title and a product image.

所述目标商品为上架商品，由电商平台的线上店铺的商家用户负责目标商品的上架发布及销售。The target commodity is an on-shelf commodity, and the merchant user of the online store of the e-commerce platform is responsible for the release and sale of the target commodity on the shelf.

所述商品图片，通常用于展示目标商品，图片中包含目标商品，或还包含为了衬托出目标商品的效果而配套目标商品的其他搭配物品，例如，当目标商品为裙子时，其商品图片之一可以是模特穿上裙子以及与裙子配套搭配的鞋子、衣服、首饰、外套，以展示裙子的效果；当目标商品为置物架时，其商品图片之一可以是置物架以及与置物架配套搭配的书本、家居电器、摆件，以展示置物架的效果。也即所述的商品图片，可能存在除了目标商品以外的其他内容，而这些内容中，也可能出现当前目标商品之外的其他商品。The product picture is usually used to display the target product, and the picture contains the target product, or also includes other matching items that match the target product in order to bring out the effect of the target product. One can be a model wearing a skirt and matching shoes, clothes, jewelry, and coats to show the effect of the skirt; when the target product is a rack, one of its product images can be the rack and matching with the rack Books, home appliances, and ornaments to show the effect of the shelf. That is, the above-mentioned product image may contain other content than the target product, and in these content, other products than the current target product may also appear.

所述的商品标题，为关联所述目标商品存储的且始于以文本形式提供的商品描述信息。在用途上，所述商品标题一般以简练的语言表达准确描述所述目标商品的名称、品牌、材质、功能、用途、卖点等等任意具体信息；The commodity title is stored in association with the target commodity and starts from commodity description information provided in the form of text. In terms of use, the product title generally expresses any specific information that accurately describes the name, brand, material, function, use, selling point, etc. of the target product in a concise language;

一种实施例中，可以从线上店铺的商品数据库中，根据目标商品的唯一标识码获取对应的商品信息，所述唯一标识码为软件工程人员为区分电商平台的各个目标商品而设定的唯一标识，以便于存储及调用目标商品的商品信息。In one embodiment, the corresponding commodity information can be obtained from the commodity database of the online store according to the unique identification code of the target commodity, and the unique identification code is set by the software engineer to distinguish each target commodity of the e-commerce platform. The unique identifier of the target product is convenient for storing and calling the product information of the target product.

当线上店铺的上架用户需要上架发布某一目标商品时，在电商平台相应的商品发布页面中录入所述目标商品对应的商品信息，然后提交到后台服务器以便将相应的商品信息关联目标商品对应的唯一标识码存储于商品数据库中。When a user on the online store needs to list a target product, enter the product information corresponding to the target product on the corresponding product release page of the e-commerce platform, and then submit it to the backend server to associate the corresponding product information with the target product. The corresponding unique identification code is stored in the commodity database.

步骤S1200、提取所述商品图片和商品标题的深层语义信息；Step S1200, extracting the deep semantic information of the product picture and the product title;

可以采用多种经预先训练至收敛的图像特征提取模型对所述商品图片进行图像特征提取，提取出商品图片中表征目标商品及其对应的其他配套物品的视觉特征相对应的深层语义信息。所述图像特征提取模型一般包括基于CNN实现的适于对图片进行深层语义特征提取的神经网络模型，例如Resnet、EfficientNet等，可由本领域技术人员灵活选用。A variety of image feature extraction models that have been pretrained to convergence can be used to extract image features from the product pictures, and deep semantic information corresponding to the visual features representing the target product and other corresponding supporting items in the product picture can be extracted. The image feature extraction model generally includes a CNN-based neural network model suitable for extracting deep semantic features of images, such as Resnet, EfficientNet, etc., which can be flexibly selected by those skilled in the art.

可以采用多种经预先训练至收敛的文本特征提取模型对所述商品标题进行文本特征提取，提取出商品标题中表征目标商品的文本特征对应的深层语义信息。所述文本特征提取模型一般包括基于RNN实现的适于对文本进行深层语义特征提取的神经网络模型，例如Bert、LSTM、Electra等，可由本领域技术人员灵活选用。A variety of pre-trained to convergent text feature extraction models can be used to extract text features from the product titles, and to extract deep semantic information corresponding to the text features in the product titles that characterize the target product. The text feature extraction model generally includes an RNN-based neural network model suitable for deep semantic feature extraction of text, such as Bert, LSTM, Electra, etc., which can be flexibly selected by those skilled in the art.

步骤S1300、将所述商品标题的深层语义信息融合至所述商品图片的深层语义信息，以根据所述商品标题的深层语义信息突显出所述目标商品在所述商品图片的深层语义信息中的图像特征，获得图文融合特征信息；Step S1300: Integrate the deep semantic information of the product title into the deep semantic information of the product image, so as to highlight the target product in the deep semantic information of the product image according to the deep semantic information of the product title. Image features to obtain image-text fusion feature information;

一种实施例中，在如图2所示的注意力层200中对所述商品标题的深层语义信息和所述商品图片的深层语义信息进行自注意力机制对应的特征交互，使得所述商品标题的深层语义信息与所述商品图片的深层语义信息在特征层面实现深度交互，从而实现商品图片与商品标题在深层语义层面的深度融合，获得注意层输出的初步图文融合特征信息，可以理解，由于实施了深度融合，将商品标题的深层语义信息融合到商品图片的深层语义信息中，商品标题在文本语义上对目标商品有指示作用，因此，使得该初步图文融合特征信息中已经参考所述商品标题的文本语义而显著表征所述目标商品的图像的特征，后续部分实施例将对此自注意力机制对应的特征交互进一步的揭示，此处暂且不表。进一步，请参阅图3，结合所述初步融合特征信息和所述商品图片的深层语义信息300，获得图文融合特征信息，所述结合可采用矩阵相加的方式，不难理解，由于初步融合特征信息中显著表征所述目标商品的图像的特征，使得所述图文融合特征可以用于识别目标商品。In an embodiment, in the attention layer 200 shown in FIG. 2, the feature interaction corresponding to the self-attention mechanism is performed on the deep semantic information of the product title and the deep semantic information of the product picture, so that the product The deep semantic information of the title and the deep semantic information of the product image realize deep interaction at the feature level, so as to realize the deep fusion of the product image and the product title at the deep semantic level, and obtain the preliminary image-text fusion feature information output by the attention layer, which is understandable , due to the implementation of deep fusion, the deep semantic information of the product title is fused into the deep semantic information of the product image, and the product title has an indicative effect on the target product in terms of text semantics. Therefore, the preliminary image-text fusion feature information has been referenced The text semantics of the product title significantly characterizes the feature of the image of the target product. Subsequent embodiments will further disclose the feature interaction corresponding to this self-attention mechanism, which is not shown here for the time being. Further, please refer to FIG. 3, combining the preliminary fusion feature information and the deep semantic information 300 of the commodity image to obtain the image-text fusion feature information, the combination can be in the form of matrix addition, which is not difficult to understand, because the preliminary fusion In the feature information, the feature of the image of the target product is significantly represented, so that the image-text fusion feature can be used to identify the target product.

步骤S1400、将所述图文融合特征信息输入至预先训练至收敛的目标检测模型，识别出所述目标商品。Step S1400: Input the image-text fusion feature information into a target detection model that has been pre-trained to convergence, and identify the target product.

对所述商品图片中的目标商品进行检测，以识别出所述目标商品，可采用预先训练至收敛状态的目标检测模型来实施。所述目标检测模型一般采用基于深度学习的模型实现，例如RCNN系列、Yolo系列以及SSD(Single Shot MultiBox Detector，一步式多框检测器)系列。RCNN系列是基于区域检测的代表性算法，YOLO是基于区域提取的代表性算法，SSD是在前两个系列的基础上改进获得的算法。Detecting the target commodity in the commodity picture to identify the target commodity may be implemented by using a target detection model pre-trained to a convergent state. The target detection model is generally implemented by a deep learning-based model, such as the RCNN series, the Yolo series, and the SSD (Single Shot MultiBox Detector, one-step multi-box detector) series. The RCNN series is a representative algorithm based on region detection, YOLO is a representative algorithm based on region extraction, and SSD is an improved algorithm based on the first two series.

RCNN系列通常包括R-CNN、SPPNet、FastR-CNN、FasterR-CNN等不同具体模型，Yolo系列也有多个版本可以采用。诸如此类的目标检测模型，均适于从给定的图片中识别出目标图像区域，从而可以根据目标图像区域获得相应的目标图像。The RCNN series usually includes different specific models such as R-CNN, SPPNet, FastR-CNN, FasterR-CNN, and the Yolo series also has multiple versions that can be used. Such target detection models are suitable for identifying target image areas from a given picture, so that corresponding target images can be obtained according to the target image areas.

一种实施例中，可采用Yolo-v5作为目标检测模型，接入分类器，采用足量的训练样本对其进行微调训练，所述训练样本为商品图片，其中包含目标商品及为其配套的其他搭配物品，每个训练样本对应其中的各个目标商品均提供相应的训练标签，以便监督模型训练，使其习得能够从给定的商品图片中准确识别出目标商品对应的区域的能力。In one embodiment, Yolo-v5 can be used as the target detection model, connected to the classifier, and fine-tuned and trained by using a sufficient number of training samples. The training samples are commodity pictures, including the target commodity and its supporting For other matching items, each training sample provides corresponding training labels for each target product in it, so as to supervise the training of the model and enable it to acquire the ability to accurately identify the area corresponding to the target product from the given product image.

因此，将所述图文融合特征信息输入至预先训练至收敛的目标检测模型，模型根据该图文融合特征信息，由于该图文融合特征信息已经融合了商品标题相对应的语义信息，因而，可以在该语义信息的作用下，从所述商品图片中检测出目标商品相对应的图像区域，输出目标商品在商品图片的坐标信息。进一步，根据所述目标检测模型输出的坐标信息，从所述商品图片中相应裁剪出目标商品的坐标信息对应的目标商品的图像，从而实现从商品图片中识别出目标商品。Therefore, the image-text fusion feature information is input into the target detection model that has been pre-trained to convergence, and the model is based on the image-text fusion feature information. Since the image-text fusion feature information has already fused the semantic information corresponding to the product title, therefore, Under the action of the semantic information, the image area corresponding to the target commodity can be detected from the commodity picture, and the coordinate information of the target commodity in the commodity picture can be output. Further, according to the coordinate information output by the target detection model, the image of the target product corresponding to the coordinate information of the target product is correspondingly cut out from the product image, thereby realizing the identification of the target product from the product image.

根据本申请揭晓的典型实施例，可以看出，本申请具有多方面的优势，至少包括：According to the typical embodiments disclosed in this application, it can be seen that this application has many advantages, including at least:

请参阅图4，进一步的实施例中，步骤S1200、提取所述商品图片和商品标题的深层语义信息的步骤中，包括如下步骤：Referring to FIG. 4, in a further embodiment, step S1200, the step of extracting the deep semantic information of the product picture and the product title, includes the following steps:

步骤S1210、预处理所述商品图片，将预处理后的商品图片输入至预先训练至收敛的图像特征提取模型，获得相应的深层语义信息，用于表征所述商品图片的图像特征；Step S1210, preprocessing the product image, and inputting the preprocessed product image into an image feature extraction model trained in advance to converge to obtain corresponding deep semantic information, which is used to characterize the image features of the product image;

为了便于后续模型提取商品图片对应的图像特征，对所述商品图片进行预处理，对商品图片的长和宽进行相同比例放大，所述比例可由本领域技术人员根据先验知识或实验数据灵活变通设置。In order to facilitate the follow-up model to extract the image features corresponding to the product images, the product images are preprocessed, and the length and width of the product images are enlarged in the same proportion. The ratio can be flexibly modified by those skilled in the art according to prior knowledge or experimental data. set up.

一种实施例中，所述图像特征提取模型为Resnet50，将预处理后的商品图片输入至预先训练至收敛的Resnet50，由Resnet50的主干块(stemblock)及4个残差块(bottleneck blocks)逐步提取商品图片对应的图像特征，其中浅层阶段(stage)提取商品图片中目标商品及为其配套的其他搭配物品对应的细节、边缘等基础特征，进而在深层阶段提取深层语义特征以及高级逻辑特征，最终获得最后阶段即Res5 stage输出的深层语义信息，如图3中300所示。In an embodiment, the image feature extraction model is Resnet50, and the preprocessed product images are input into the Resnet50 that has been pre-trained to converge, and the Resnet50 stem block and 4 residual blocks (bottleneck blocks) step by step. Extract the image features corresponding to the product pictures, in which the shallow stage (stage) extracts basic features such as details and edges corresponding to the target product and other matching items in the product picture, and then extracts deep semantic features and advanced logic features in the deep stage. , and finally obtain the deep semantic information output by the final stage, that is, the Res5 stage, as shown by 300 in Figure 3.

步骤S1220、预处理所述商品标题，将预处理后的商品标题输入至预先训练至收敛的文本特征提取模型，获得相应的深层语义信息，用于表征所述商品标题的文本特征。Step S1220: Preprocess the product title, input the preprocessed product title into a text feature extraction model that has been pre-trained to convergence, and obtain corresponding deep semantic information for characterizing the text feature of the product title.

一般而言商品标题中的文本格式比较繁杂，可能包含换行符、多余标点符号、多余的空白字符等等，这些字符对于商品标题本身的语义没有太大的影响，反而会干扰后续语义提取的精准度，因此，为了提升后续模型提取深层语义信息准确性，可对商品标题进行格式预处理，示例性举例，所述预处理可包括：把换行符替换成空格符号；将2个以上的空白字符串替换成只保留一个空白字符；将2个以上的标点符号替换成只保留一个等等。格式预处理的方式按需采用，本技术领域人员可根据实际业务情况进行灵活变通实施。Generally speaking, the text format in the product title is complicated and may contain line breaks, redundant punctuation marks, redundant blank characters, etc. These characters do not have much impact on the semantics of the product title itself, but will interfere with the accuracy of subsequent semantic extraction. Therefore, in order to improve the accuracy of the deep semantic information extracted by the subsequent model, the product title can be subjected to format preprocessing. For example, the preprocessing may include: replacing line breaks with space symbols; replacing two or more blank characters Replace strings to keep only one whitespace character; replace 2 or more punctuation marks to keep only one, etc. The format preprocessing method is adopted as needed, and those skilled in the art can flexibly implement it according to actual business conditions.

一种实施例中，所述文本特征提取模型为Bert，将预处理后的商品标题输入至预先训练至收敛的Bert，提取商品标题中表征目标商品的商品品类的文本对应的文本特征，例如商品标题为“手工钉珠胸花套装优雅女士裙”，其中表征目标商品的商品品类的文本即为“女士裙”，获得相应的深层语义信息。In one embodiment, the text feature extraction model is Bert, and the preprocessed product title is input into the Bert that has been pre-trained to converge, and the text features corresponding to the text representing the product category of the target product in the product title are extracted, such as a product. The title is "Handmade Beaded Corsage Suit Elegant Ladies Skirt", in which the text representing the commodity category of the target commodity is "Ladies Skirt", and corresponding deep semantic information is obtained.

本实施例中，通过预训练至收敛的图像特征提取模型及文本特征提取模型，实现智能化快速精准地相对应提取商品图片和商品标题相对应的深层语义信息。In this embodiment, through the image feature extraction model and the text feature extraction model pre-trained to the convergence, intelligent, fast and accurate corresponding extraction of deep semantic information corresponding to the product picture and the product title is realized.

请参阅图5，深化的实施例中，步骤S1220、预处理所述商品标题的步骤中，包括如下步骤：Referring to FIG. 5, in the further embodiment, step S1220, the step of preprocessing the product title, includes the following steps:

步骤S1221、过滤所述商品标题中的无效字符；Step S1221, filtering invalid characters in the product title;

一般而言，所述商品标题中通常包含目标商品的商品品类的文本，以及目标商品的效果、功能、质地、用材等等修饰文本，可以理解，对于实现本申请的目标商品识别，所述修饰文本为无效字符，因此，可过滤商品标题中的该无效字符，以便于后续模型提取相应的文本特征。一种实施例中，可通过人工或人工智能方式预先采集所述修饰文本对应的文本归集成无效字符词典，进而将所述商品标题对应的文本与无效字符词典中的修饰文本对应的文本作精准匹配和/或模糊匹配，继而根据匹配结果确定商品标题中的无效字符，将其删除以实现对商品标题的过滤。Generally speaking, the product title usually contains the text of the product category of the target product, as well as the effect, function, texture, material and other modification text of the target product. The text is an invalid character. Therefore, the invalid character in the product title can be filtered, so that the subsequent model can extract the corresponding text features. In one embodiment, the text corresponding to the modified text can be pre-collected manually or artificially and aggregated into an invalid character dictionary, and then the text corresponding to the product title and the text corresponding to the modified text in the invalid character dictionary can be accurately calculated. Matching and/or fuzzy matching, and then determine invalid characters in the product title according to the matching result, and delete them to filter the product title.

步骤S1222、对过滤后的商品标题进行分词，获得其中的关键词，所述关键词包括目标商品的产品词和/或品牌词，完成对所述商品标题的预处理。Step S1222: Perform word segmentation on the filtered product title to obtain keywords therein, where the keywords include product words and/or brand words of the target product, and complete the preprocessing of the product title.

一种实施例中，采用分词器BasicTokenizer和WordpieceTokenizer对过滤后的商品标题进行分词，先通过BasicTokenizer得到一个分得比较粗的token列表，然后再对每个token进行一次WordpieceTokenizer，从而获得其中的关键词，完成对所述商品标题的预处理。In one embodiment, the tokenizers BasicTokenizer and WordpieceTokenizer are used to segment the filtered product titles, first obtain a list of tokens with relatively coarse scores through the BasicTokenizer, and then perform WordpieceTokenizer on each token to obtain the keywords in it. , to complete the preprocessing of the product title.

本实施例中，通过过滤商品标题中的无效字符，使得后续模型提取相应的文本特征时，需要处理的文本更少，干扰也更小，从而能够提升模型的执行效率和精准度。In this embodiment, by filtering invalid characters in the product title, when the subsequent model extracts corresponding text features, less text needs to be processed and less interference, so that the execution efficiency and accuracy of the model can be improved.

请参阅图6，进一步的实施例中，步骤S1300、将所述商品标题的深层语义信息融合至所述商品图片的深层语义信息，以根据所述商品标题的深层语义信息突显出所述目标商品在所述商品图片的深层语义信息中的图像特征，获得图文融合特征信息的步骤中，包括如下步骤：Referring to FIG. 6, in a further embodiment, in step S1300, the deep semantic information of the product title is fused to the deep semantic information of the product image, so as to highlight the target product according to the deep semantic information of the product title In the step of obtaining image-text fusion feature information from image features in the deep semantic information of the product image, the following steps are included:

步骤S1310、采用多模态特征交互融合模块融合所述商品标题的深层语义信息和所述商品图片的深层语义信息，获得初步融合特征信息，所述初步融合特征信息中显著表征所述目标商品的图像的特征；Step S1310, using a multi-modal feature interaction fusion module to fuse the deep semantic information of the product title and the deep semantic information of the product image to obtain preliminary fusion feature information, in which the preliminary fusion feature information significantly characterizes the target commodity. characteristics of the image;

多模态特征交互融合模块如图2所示，具体而言，在注意力层200中对所述商品标题的深层语义信息采用相同两个的卷积层提取出相应的两个相同的特征信息，并且对所述商品图片的深层语义信息采用另外两个相同的卷积层提取出相应的两个相同的特征信息。进一步，对提取出的所述商品图片的深层语义信息对应的其中一个特征信息与提取出的所述商品标题的深层语义信息对应的两个相同的特征信息进行自注意力机制对应的特征交互，使得所述商品标题的深层语义信息与所述商品图片的深层语义信息在特征层面实现深度交互，从而实现商品图片与商品标题在深层语义层面的深度融合，获得注意力层输出的初步融合图文融合特征信息，可以理解，由于实施了深度融合，将商品标题的深层语义信息融合到商品图片的深层语义信息中，商品标题在文本语义上对目标商品有指示作用，因此，使得该初步图文融合特征信息中已经参考所述商品标题的文本语义而显著表征所述目标商品的图像的特征，后续部分实施例将对此自注意力机制对应的特征交互进一步的揭示，此处暂且不表。The multi-modal feature interaction fusion module is shown in Figure 2. Specifically, in the attention layer 200, the same two convolution layers are used to extract the corresponding two identical feature information for the deep semantic information of the product title. , and use two other identical convolution layers to extract the corresponding two identical feature information from the deep semantic information of the product image. Further, the feature interaction corresponding to the self-attention mechanism is performed on one of the feature information corresponding to the extracted deep semantic information of the product image and two identical feature information corresponding to the extracted deep semantic information of the product title, Make the deep semantic information of the product title and the deep semantic information of the product image realize deep interaction at the feature level, so as to realize the deep fusion of the product image and the product title at the deep semantic level, and obtain the preliminary fusion image and text output by the attention layer. Fusion of feature information, it can be understood that due to the implementation of deep fusion, the deep semantic information of the product title is fused into the deep semantic information of the product image, and the product title has an indicative effect on the target product in terms of text semantics. In the fusion feature information, the feature of the image of the target product has been significantly characterized with reference to the textual semantics of the product title. Subsequent embodiments will further disclose the feature interaction corresponding to the self-attention mechanism, which is not shown here for the time being.

可选的实施例中，请参阅图2，进一步，可采用一层卷积层提取出所述初步图文融合特征信息对应的特征信息201，将其与提取出的所述商品图片的深层语义信息对应的其另一特征信息202进行矩阵点乘，不难理解，在矩阵点乘之后，初步图文融合特征信息中表征商品图片中的目标商品的图像的特征被进一步特征显化，获得特征显化的初步图文融合特征信息。所述矩阵点乘为两个相同维度的矩阵进行两者中的特征数据按位对应相乘，即特征信息201对应的矩阵中的第一行第一列的特征数据与特征信息202对应的矩阵中的第一行第一列的特征数据相乘，特征信息201对应的矩阵中的第一行第二列的特征数据与特征信息202对应的矩阵中的第一行第二列的特征数据相乘，特征信息201对应的矩阵中的第二行第一列的特征数据与特征信息202对应的矩阵中的第二行第一列的特征数据相乘，以此类推。In an optional embodiment, please refer to FIG. 2, further, a layer of convolution layer can be used to extract the feature information 201 corresponding to the preliminary image-text fusion feature information, and compare it with the extracted deep semantics of the product image. Another feature information 202 corresponding to the information is subjected to matrix dot multiplication. It is not difficult to understand that after the matrix dot multiplication, the features of the image representing the target product in the product picture in the preliminary image-text fusion feature information are further characterized to obtain features. Explicit preliminary image-text fusion feature information. The matrix dot product is two matrices of the same dimension, and the feature data in the two are multiplied bit by bit, that is, the feature data of the first row and the first column in the matrix corresponding to the feature information 201 and the matrix corresponding to the feature information 202 The feature data of the first row and the first column in the matrix are multiplied, and the feature data of the first row and the second column of the matrix corresponding to the feature information 201 is the same as the feature data of the first row and the second column of the matrix corresponding to the feature information 202. Multiplication, the feature data of the second row and the first column in the matrix corresponding to the feature information 201 is multiplied by the feature data of the second row and the first column in the matrix corresponding to the feature information 202 , and so on.

步骤S1320、将所述初步融合特征信息与所述商品图片的深层语义信息结合，获得图文融合特征信息；Step S1320, combining the preliminary fusion feature information with the deep semantic information of the commodity image to obtain image-text fusion feature information;

结合所述初步融合特征信息和所述商品图片的深层语义信息如图3中300所示，获得图文融合特征信息，所述结合可采用矩阵相加的方式，不难理解，由于初步融合特征信息中显著表征所述目标商品的图像的特征，使得所述图文融合特征可以用于识别目标商品的图像。Combining the preliminary fusion feature information and the deep semantic information of the product image as shown in 300 in Figure 3, the graphic and text fusion feature information is obtained. The combination can be in the form of matrix addition, which is not difficult to understand. Since the preliminary fusion features In the information, the features of the image of the target product are significantly characterized, so that the image-text fusion feature can be used to identify the image of the target product.

本实施例中，通过自注意力机制对应的特征交互使得所述商品图片的深层语义信息中对应目标商品的图像的特征被显化，使得图文融合特征信息中表征商品图片中目标商品的图像的特征被突显而成为显著特征，有助于提升后续模型对商品图片中的目标商品的识别的精准度。In this embodiment, through the feature interaction corresponding to the self-attention mechanism, the feature of the image corresponding to the target product in the deep semantic information of the product image is displayed, so that the image of the target product in the product image is represented in the image-text fusion feature information. The features are highlighted and become salient features, which helps to improve the recognition accuracy of the target product in the product image by the subsequent model.

请参阅图7，较佳的实施例中，步骤S1310、采用多模态特征交互融合模块融合所述商品标题的深层语义信息和所述商品图片的深层语义信息，获得初步融合特征信息的步骤中，包括如下步骤：Referring to FIG. 7, in a preferred embodiment, in step S1310, a multi-modal feature interactive fusion module is used to fuse the deep semantic information of the product title and the deep semantic information of the product image, and in the step of obtaining preliminary fusion feature information , including the following steps:

步骤S1311、以所述商品图片的深层语义信息构造查询向量，以所述商品标题的深层语义信息构造键向量和值向量，输入注意力层；Step S1311, constructing a query vector with the deep semantic information of the product image, constructing a key vector and a value vector with the deep semantic information of the product title, and inputting to the attention layer;

请参阅图2，在注意力层(Attention)200中，将所述商品图片的深层语义信息及商品标题的深层语义信息作为注意力层的输入，以相应的卷积层即权重矩阵W^Q提取商品图片的深层语义信息获得相应的查询向量(Query)，以相应的两个卷积层即权重矩阵W^K、W^V提取商品标题的深层语义信息获得相应的键向量(Key)以及值向量(Value)。所述权重矩阵W^Q、W^K、W^V均为可学习权重，。Referring to FIG. 2, in the attention layer (Attention) 200, the deep semantic information of the product image and the deep semantic information of the product title are used as the input of the attention layer, and the corresponding convolution layer, namely the weight matrix W ^Q is used to extract The corresponding query vector ( ^Query ) is obtained from the deep ^semantic information of the product image, and the corresponding key vector (Key) and value vector ( Value). The weight matrices W ^Q , W ^K , and W ^V are all learnable weights.

步骤S1312、由所述注意力层将所述查询向量与所述键向量进行交互并归一化，获得权重矩阵；Step S1312, interact and normalize the query vector and the key vector by the attention layer to obtain a weight matrix;

继续参阅图2，在所述注意力层200中，将所述查询向量与所述键向量对应的转置矩阵进行矩阵相乘操作，获得一个实现所述商品图片的深层语义信息与商品标题的深层语义信息的特征交互的乘积矩阵，该乘积矩阵为HW*T的尺度，采用Softmax函数对其进行激活输出之后，获得的权重矩阵是对商品图片的深层语义信息与商品标题的深层语义信息进行深度交互之后的语义信息，本质上也是根据所述商品标题的深层语义信息实现对所述商品图片的深层语义信息中的显著特征即商品图片中目标商品的图像的特征进行突出的提权结果。Continuing to refer to FIG. 2, in the attention layer 200, a matrix multiplication operation is performed on the transposed matrix corresponding to the query vector and the key vector to obtain a deep semantic information that realizes the product image and the product title. The product matrix of the feature interaction of the deep semantic information, the product matrix is the scale of HW*T, after using the Softmax function to activate and output, the obtained weight matrix is the deep semantic information of the product image and the product title. The semantic information after the deep interaction is essentially the result of highlighting the salient features in the deep semantic information of the product picture, that is, the feature of the image of the target product in the product picture, according to the deep semantic information of the product title.

步骤S1313、由所述注意力层将所述值向量匹配所述权重矩阵获得初始特征；Step S1313, matching the value vector to the weight matrix by the attention layer to obtain initial features;

继续参阅图2，在所述注意力层200中，将经Softmax函数激活输出的尺度为HW*T的权重矩阵，再与所述值向量对应的转置矩阵也即所述的尺度为T*C_i的文本特征再进行矩阵相乘运算，获得尺度为HW*C_i的乘积矩阵，即为将所述商品图片的深层语义信息与商品标题的深层语义信息进行自注意力机制对应的特征交互获得的初步融合特征信息。Continuing to refer to FIG. 2, in the attention layer 200, the weight matrix whose scale is HW*T is activated and output by the Softmax function, and the transposed matrix corresponding to the value vector, that is, the scale is T* The text features of C _i are then multiplied by a matrix to obtain a product matrix with a scale of HW*C _i , which is the feature interaction corresponding to the self-attention mechanism between the deep semantic information of the product image and the deep semantic information of the product title. The obtained preliminary fusion feature information.

本实施例中，所述初步融合特征信息是在根据所述商品图片的深层语义信息与商品标题的深层语义信息进行交互之后获得的所述权重矩阵基础上乘上商品标题的深层语义信息对应匹配了相应的权重W^V的值向量获得的，再次对商品图片的深层语义信息深度融合商品标题的深层语义信息，使得初步融合特征信息中显著表征所述目标商品的图像的特征。In this embodiment, the preliminary fusion feature information is based on the weight matrix obtained after the interaction between the deep semantic information of the product image and the deep semantic information of the product title, multiplied by the deep semantic information of the product title and correspondingly matched. The value vector of the corresponding weight W ^V is obtained, and the deep semantic information of the product image is deeply fused with the deep semantic information of the product title, so that the features of the image of the target product are significantly represented in the preliminary fusion feature information.

请参阅图8，进一步的实施例中，步骤S1400、将所述图文融合特征信息输入至预先训练至收敛的目标检测模型，识别出所述目标商品的步骤中，包括如下步骤：Referring to FIG. 8 , in a further embodiment, in step S1400 , the step of inputting the image-text fusion feature information into a target detection model that has been pre-trained to convergence, and identifying the target commodity, includes the following steps:

步骤S1410、采用预先训练至收敛的目标检测模型根据所述图文融合特征信息检测所述商品图片中的目标商品，获得相应的检测区域；Step S1410, using a pre-trained target detection model to converge to detect the target commodity in the commodity picture according to the image-text fusion feature information, and obtain a corresponding detection area;

一种实施例中，所述目标检测模型为MaskRCNN，将所述图文融合特征信息输入值预先训练至收敛的MaskRCNN中，从所述商品图片中检测出目标商品的图像对应的检测区域。In one embodiment, the target detection model is MaskRCNN, the input value of the image-text fusion feature information is pre-trained into the converged MaskRCNN, and the detection area corresponding to the image of the target product is detected from the product image.

步骤S1420、求取包围所述检测区域的最小面积的矩形框，以其框选出目标商品作为识别结果。Step S1420: Obtain a rectangular frame with a minimum area surrounding the detection area, and select a target commodity from the frame as a recognition result.

求取包围所述检测区域的最小面积的矩形框，使得该矩形框以矩形完整包含所述检测区域中的目标商品的图像，且包含非目标商品的图像的区域对应的面积最小，获得该最小面积的矩形框及其对应在所述商品图片中的位置信息，通常所述位置信息为矩形框对应的四个顶点的坐标。进一步，根据该矩形框框选出商品图片中的目标商品的图像作为识别结果。Obtain the rectangular frame with the smallest area surrounding the detection area, so that the rectangular frame completely contains the image of the target product in the detection area in a rectangle, and the area corresponding to the area containing the image of the non-target product is the smallest, and the minimum area is obtained. The rectangular frame of the area and its corresponding position information in the product picture, usually the position information is the coordinates of the four vertices corresponding to the rectangular frame. Further, the image of the target product in the product picture is selected as the recognition result according to the rectangular frame.

本实施例中，通过求取包围所述检测区域的最小面积的矩形框，以其框选出目标商品作为识别结果，使得提升识别结果的精准度。In this embodiment, a rectangular frame with a minimum area surrounding the detection area is obtained, and a target commodity is selected by the frame as the recognition result, so as to improve the accuracy of the recognition result.

请参阅图9，扩展的实施例中，步骤S1400将所述图文融合特征信息输入至预先训练至收敛的目标检测模型，识别出所述目标商品的步骤之后，还包括如下步骤：Referring to FIG. 9 , in an extended embodiment, step S1400 inputs the image-text fusion feature information into a pre-trained target detection model that has converged, and after the step of recognizing the target product, it further includes the following steps:

步骤S1500、根据所述框选出目标商品的矩形框从所述商品图片中截取出目标商品的图像，将其关联目标商品的唯一标识码存储于商品数据库；Step S1500, according to the rectangular frame of the selected target commodity, cut out the image of the target commodity from the commodity picture, and store the unique identification code associated with the target commodity in the commodity database;

根据所述框选出目标商品的矩形框对应在所述商品图片中的位置信息从所述商品图片中截取出目标商品的图像，将该目标商品的图像关键目标商品的唯一标识码存储与商品数据库中，以备后续调用。The image of the target product is cut out from the product image according to the position information in the product picture corresponding to the rectangular box of the target product, and the image of the target product is stored with the unique identification code of the target product with the product image. database for subsequent calls.

步骤S1600、响应商品推荐请求，根据目标商品的唯一标识码检索商品数据库获取目标商品的图像，匹配与其相似的推荐商品；Step S1600: In response to the commodity recommendation request, search the commodity database to obtain the image of the target commodity according to the unique identification code of the target commodity, and match the recommended commodity similar to it;

可以理解，电商平台中部分电商页面需要加载推荐商品，因而触发生成商品推荐请求推送至电商平台的服务器，服务器接收该请求并对其进行响应，根据目标商品的唯一标识码检索商品数据库获取目标商品的图像，将其与商品数据库中与该目标商品属于同一商品品类的商品对应的目标商品的图像进行图片相似度匹配，匹配相似度超过阈值的商品作为推荐商品，所述阈值可由本领域技术人员按业务所需设置，所述商品品类一般由电商平台而设定，线上店铺的商家用户在其发布商品时通常需要选定发布商品对应的商品品类，因此，电商平台中的商品都有其对应的商品品类。It can be understood that some e-commerce pages in the e-commerce platform need to load recommended products, which triggers the generation of a product recommendation request and pushes it to the server of the e-commerce platform. The server receives the request and responds to it, and retrieves the product database according to the unique identification code of the target product. Obtain the image of the target product, and perform image similarity matching with the image of the target product in the product database corresponding to the product belonging to the same product category as the target product, and match the product whose similarity exceeds the threshold as the recommended product. Persons skilled in the art set according to business needs. The commodity category is generally set by the e-commerce platform. The merchant user of the online store usually needs to select the commodity category corresponding to the published commodity when publishing the commodity. Therefore, in the e-commerce platform. Each product has its corresponding product category.

步骤S1700、应答所述商品推荐请求，推送所述推荐商品。Step S1700: Respond to the commodity recommendation request, and push the recommended commodity.

进一步，应答所述商品推荐请求，推送所述推荐商品至相应的电商页面，由电商页面接收该推荐商品将其加载显示。Further, in response to the product recommendation request, the recommended product is pushed to a corresponding e-commerce page, and the e-commerce page receives the recommended product and loads and displays it.

本实施例中，由于为电商平台中的商品的商品图片都截取出其对应的目标商品的图像，使得据此进行商品推荐的精准度得以保障，实现精准推荐。In this embodiment, since the images of the products in the e-commerce platform are all cut out images of the corresponding target products, the accuracy of the product recommendation based on this can be guaranteed, and the accurate recommendation can be achieved.

请参阅图10，适应本申请的目的之一而提供的一种目标商品识别装置，是对本申请的目标商品识别方法的功能化体现，该装置包括：图文获取模块1100、语义提取模块1200、特征融合模块1300以及目标识别模块1400，其中，图文获取模块1100，用于获取目标商品的商品信息中的商品标题及商品图片；语义提取模块1200，用于提取所述商品图片和商品标题的深层语义信息；特征融合模块1300，用于将所述商品标题的深层语义信息融合至所述商品图片的深层语义信息，以根据所述商品标题的深层语义信息突显出所述目标商品在所述商品图片的深层语义信息中的图像特征，获得图文融合特征信息；目标识别模块1400，用于将所述图文融合特征信息输入至预先训练至收敛的目标检测模型，识别出所述目标商品。Please refer to FIG. 10 , a target commodity identification device provided for one of the purposes of the present application is a functional embodiment of the target commodity identification method of the present application. The device includes: a graphic and text acquisition module 1100, a semantic extraction module 1200, The feature fusion module 1300 and the target recognition module 1400, wherein, the graphic acquisition module 1100 is used to acquire the commodity title and commodity picture in the commodity information of the target commodity; the semantic extraction module 1200 is used to extract the commodity picture and commodity title. deep semantic information; the feature fusion module 1300 is used to fuse the deep semantic information of the product title into the deep semantic information of the product picture, so as to highlight the target product in the product according to the deep semantic information of the product title The image features in the deep semantic information of the product images are used to obtain image-text fusion feature information; the target recognition module 1400 is used to input the image-text fusion feature information into a pre-trained target detection model that has converged to identify the target product. .

进一步的实施例中，所述语义提取模块1200，包括：图像特征提取子模块，用于预处理所述商品图片，将预处理后的商品图片输入至预先训练至收敛的图像特征提取模型，获得相应的深层语义信息，用于表征所述商品图片的图像特征；文本特征提取子模块，用于预处理所述商品标题，将预处理后的商品标题输入至预先训练至收敛的文本特征提取模型，获得相应的深层语义信息，用于表征所述商品标题的文本特征。In a further embodiment, the semantic extraction module 1200 includes: an image feature extraction sub-module, configured to preprocess the product image, input the preprocessed product image into the image feature extraction model that has been pre-trained to convergence, and obtain Corresponding deep semantic information is used to characterize the image features of the product picture; a text feature extraction sub-module is used to preprocess the product title, and input the preprocessed product title into the text feature extraction model that has been pre-trained to convergence , to obtain the corresponding deep semantic information, which is used to characterize the text features of the product title.

进一步的实施例中，所述特征融合模块1300，包括：语义融合子模块，用于采用多模态特征交互融合模块融合所述商品标题的深层语义信息和所述商品图片的深层语义信息，获得初步融合特征信息，所述初步融合特征信息中显著表征所述目标商品的图像的特征；信息结合子模块，用于将所述初步融合特征信息与所述商品图片的深层语义信息结合，获得图文融合特征信息；In a further embodiment, the feature fusion module 1300 includes: a semantic fusion sub-module, configured to use a multi-modal feature interaction fusion module to fuse the deep semantic information of the product title and the deep semantic information of the product image to obtain Preliminary fusion feature information, in which the features of the image of the target commodity are significantly represented in the preliminary fusion characteristic information; an information combination sub-module is used to combine the preliminary fusion characteristic information with the deep semantic information of the commodity image to obtain a graph text fusion feature information;

进一步的实施例中，所述目标识别模块1400，包括：目标检测单元，用于采用预先训练至收敛的目标检测模型根据所述图文融合特征信息检测所述商品图片中的目标商品，获得相应的检测区域；框选识别单元，用于求取包围所述检测区域的最小面积的矩形框，以其框选出目标商品作为识别结果。In a further embodiment, the target identification module 1400 includes: a target detection unit, configured to detect the target commodity in the commodity picture according to the image-text fusion feature information by using a pre-trained target detection model to converge, and obtain the corresponding target commodity. the detection area; the frame selection recognition unit is used to obtain a rectangular frame with the smallest area surrounding the detection area, and select the target product by its frame as the recognition result.

扩展的实施例中，所述目标识别模块1400之后，还包括：截取存储模块，用于根据所述框选出目标商品的矩形框从所述商品图片中截取出目标商品的图像，将其关联目标商品的唯一标识码存储于商品数据库；响应请求模块，用于响应商品推荐请求，根据目标商品的唯一标识码检索商品数据库获取目标商品的图像，匹配与其相似的推荐商品；应答请求模块，用于应答所述商品推荐请求，推送所述推荐商品。In an extended embodiment, after the target identification module 1400, it further includes: an interception storage module, configured to intercept the image of the target commodity from the commodity picture according to the rectangular frame for selecting the target commodity, and associate it with the image of the target commodity. The unique identification code of the target commodity is stored in the commodity database; the response request module is used to respond to the commodity recommendation request, retrieve the commodity database according to the unique identification code of the target commodity to obtain the image of the target commodity, and match the recommended commodity similar to it; In response to the product recommendation request, push the recommended product.

为解决上述技术问题，本申请实施例还提供计算机设备。如图11所示，计算机设备的内部结构示意图。该计算机设备包括通过系统总线连接的处理器、计算机可读存储介质、存储器和网络接口。其中，该计算机设备的计算机可读存储介质存储有操作系统、数据库和计算机可读指令，数据库中可存储有控件信息序列，该计算机可读指令被处理器执行时，可使得处理器实现一种目标商品识别方法。该计算机设备的处理器用于提供计算和控制能力，支撑整个计算机设备的运行。该计算机设备的存储器中可存储有计算机可读指令，该计算机可读指令被处理器执行时，可使得处理器执行本申请的目标商品识别方法。该计算机设备的网络接口用于与终端连接通信。本领域技术人员可以理解，图11中示出的结构，仅仅是与本申请方案相关的部分结构的框图，并不构成对本申请方案所应用于其上的计算机设备的限定，具体的计算机设备可以包括比图中所示更多或更少的部件，或者组合某些部件，或者具有不同的部件布置。To solve the above technical problems, the embodiments of the present application also provide computer equipment. As shown in Figure 11, a schematic diagram of the internal structure of the computer equipment. The computer device includes a processor, a computer-readable storage medium, a memory, and a network interface connected by a system bus. Wherein, the computer-readable storage medium of the computer device stores an operating system, a database and computer-readable instructions, and the database may store a sequence of control information. When the computer-readable instructions are executed by the processor, the processor can be made to implement a Target product identification method. The processor of the computer device is used to provide computing and control capabilities and support the operation of the entire computer device. Computer-readable instructions may be stored in the memory of the computer device, and when executed by the processor, the computer-readable instructions may cause the processor to execute the target commodity identification method of the present application. The network interface of the computer equipment is used for communication with the terminal connection. Those skilled in the art can understand that the structure shown in FIG. 11 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.

本实施方式中处理器用于执行图10中的各个模块及其子模块的具体功能，存储器存储有执行上述模块或子模块所需的程序代码和各类数据。网络接口用于向用户终端或服务器之间的数据传输。本实施方式中的存储器存储有本申请的目标商品识别装置中执行所有模块/子模块所需的程序代码及数据，服务器能够调用服务器的程序代码及数据执行所有子模块的功能。In this embodiment, the processor is used to execute the specific functions of each module and its sub-modules in FIG. 10 , and the memory stores program codes and various types of data required to execute the above-mentioned modules or sub-modules. The network interface is used for data transmission between user terminals or servers. The memory in this embodiment stores program codes and data required to execute all modules/sub-modules in the target commodity identification device of the present application, and the server can call the server's program codes and data to execute the functions of all sub-modules.

本申请还提供一种存储有计算机可读指令的存储介质，计算机可读指令被一个或多个处理器执行时，使得一个或多个处理器执行本申请任一实施例的目标商品识别方法的步骤。The present application further provides a storage medium storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors can execute the target commodity identification method of any embodiment of the present application. step.

本领域普通技术人员可以理解实现本申请上述实施例方法中的全部或部分流程，是可以通过计算机程序来指令相关的硬件来完成，该计算机程序可存储于一计算机可读取存储介质中，该程序在执行时，可包括如上述各方法的实施例的流程。其中，前述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory，ROM)等计算机可读存储介质，或随机存储记忆体(Random Access Memory，RAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above-mentioned embodiments of the present application can be implemented by instructing relevant hardware through a computer program, and the computer program can be stored in a computer-readable storage medium. When the program is executed, it may include the flow of the embodiments of the above-mentioned methods. The aforementioned storage medium may be a computer-readable storage medium such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM).

本技术领域技术人员可以理解，本申请中已经讨论过的各种操作、方法、流程中的步骤、措施、方案可以被交替、更改、组合或删除。进一步地，具有本申请中已经讨论过的各种操作、方法、流程中的其他步骤、措施、方案也可以被交替、更改、重排、分解、组合或删除。进一步地，现有技术中的具有与本申请中公开的各种操作、方法、流程中的步骤、措施、方案也可以被交替、更改、重排、分解、组合或删除。Those skilled in the art can understand that various operations, methods, steps, measures, and solutions in the process discussed in this application may be alternated, modified, combined or deleted. Further, other steps, measures, and solutions in the various operations, methods, and processes that have been discussed in this application may also be alternated, modified, rearranged, decomposed, combined, or deleted. Further, steps, measures and solutions in the prior art with various operations, methods, and processes disclosed in this application may also be alternated, modified, rearranged, decomposed, combined or deleted.

以上所述仅是本申请的部分实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本申请原理的前提下，还可以做出若干改进和润饰，这些改进和润饰也应视为本申请的保护范围。The above are only part of the embodiments of the present application. It should be pointed out that for those skilled in the art, without departing from the principles of the present application, several improvements and modifications can also be made. It should be regarded as the protection scope of this application.

Claims

1. a target commodity identification method, is characterized in that, comprises the steps:

Obtain the product title and product image in the product information of the target product;

Extract the deep semantic information of the product image and product title;

Integrating the deep semantic information of the product title with the deep semantic information of the product image, so as to highlight the image features of the target product in the deep semantic information of the product image according to the deep semantic information of the product title, Obtain image-text fusion feature information;

The image-text fusion feature information is input into the target detection model pre-trained to convergence, and the target product is identified.

2. The target commodity identification method according to claim 1, wherein the step of extracting the deep semantic information of the commodity picture and the commodity title comprises the following steps:

Preprocessing the product picture, and inputting the preprocessed product picture into an image feature extraction model trained in advance to converge to obtain corresponding deep semantic information, which is used to characterize the image features of the product picture;

The product title is preprocessed, and the preprocessed product title is input into a text feature extraction model that has been pre-trained to convergence to obtain corresponding deep semantic information, which is used to characterize the text feature of the product title.

3. target commodity identification method according to claim 2, is characterized in that, in the step of preprocessing described commodity title, comprises the following steps:

filter invalid characters in the title of the product;

Perform word segmentation on the filtered product title to obtain keywords therein, where the keywords include product words and/or brand words of the target product, and complete the preprocessing of the product title.

4 . The target commodity identification method according to claim 1 , wherein the deep semantic information of the commodity title is fused into the deep semantic information of the commodity picture, so as to be highlighted according to the deep semantic information of the commodity title. 5 . The image features of the target product in the deep semantic information of the product picture, and the step of obtaining image-text fusion feature information includes the following steps:

A multimodal feature interaction fusion module is used to fuse the deep semantic information of the product title and the deep semantic information of the product image to obtain preliminary fusion feature information, in which the features of the image of the target product are significantly represented in the preliminary fusion feature information. ;

combining the preliminary fusion feature information with the deep semantic information of the commodity image to obtain image-text fusion feature information;

5 . The target commodity identification method according to claim 3 , wherein a multimodal feature interactive fusion module is used to fuse the deep semantic information of the commodity title and the deep semantic information of the commodity picture to obtain preliminary fusion feature information. 6 . The steps include the following steps:

Construct a query vector with the deep semantic information of the commodity picture, construct a key vector and a value vector with the deep semantic information of the commodity title, and input the attention layer;

Interacting and normalizing the query vector and the key vector by the attention layer to obtain a weight matrix;

Preliminary fusion feature information is obtained by matching the value vector with the weight matrix by the attention layer.

6. The target commodity identification method according to claim 1, wherein the image-text fusion feature information is input into a target detection model trained in advance to converge, and the step of recognizing the target commodity comprises the following steps :

Use a pre-trained target detection model to converge to detect the target product in the product image according to the image-text fusion feature information, and obtain a corresponding detection area;

A rectangular frame with the smallest area surrounding the detection area is obtained, and the target commodity is selected from the frame as the recognition result.

7. The target product identification method according to claim 1, wherein the image-text fusion feature information is input into a target detection model trained in advance to converge, and after the step of recognizing the target product, the method further comprises the following steps: step:

Cut out the image of the target product from the product picture according to the rectangular frame of the selected target product, and store the unique identification code associated with the target product in the product database;

In response to the product recommendation request, search the product database according to the unique identification code of the target product to obtain the image of the target product, and match the recommended products similar to it;

Respond to the product recommendation request, and push the recommended product.

8. A target commodity identification device, characterized in that, comprising:

The graphic and text acquisition module is used to acquire the product title and product picture in the product information of the target product;

A semantic extraction module, used for extracting the deep semantic information of the commodity picture and commodity title;

A feature fusion module is used to fuse the deep semantic information of the product title to the deep semantic information of the product image, so as to highlight the deep semantic information of the target product in the product image according to the deep semantic information of the product title Image features in the information to obtain image-text fusion feature information;

The target recognition module is used for inputting the image-text fusion feature information into the target detection model pre-trained to convergence, and identifying the target product.

9. A computer device comprising a central processing unit and a memory, wherein the central processing unit is used to call and run a computer program stored in the memory to execute the computer program according to any one of claims 1 to 7 steps of the method.

10. A computer-readable storage medium, characterized in that it stores a computer program implemented by the method according to any one of claims 1 to 7 in the form of computer-readable instructions, and the computer program is called by a computer At runtime, the steps included in the corresponding method are executed.