CN112149653B

CN112149653B - Information processing method, information processing device, electronic equipment and storage medium

Info

Publication number: CN112149653B
Application number: CN202010974510.4A
Authority: CN
Inventors: 孙瑞娜; 康永杰; 杨森; 尚航; 薛云鹤; 黄仁杰
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2020-09-16
Filing date: 2020-09-16
Publication date: 2024-03-29
Anticipated expiration: 2040-09-16
Also published as: CN112149653A

Abstract

The present disclosure relates to an information processing method, the method including: acquiring image content characteristics and text content characteristics in object description contents; determining target image content characteristics in the image content characteristics; the association degree between the object information characterized by the target image content characteristics and the object information characterized by the text content characteristics meets a preset condition; and determining a target classification result of the object description content according to the target image content characteristics and the character content characteristics. The method can accurately classify the object description content.

Description

Information processing method, information processing device, electronic equipment and storage medium

Technical Field

The disclosure relates to the field of computer technology, and in particular, to an information processing method, an information processing device, electronic equipment and a storage medium.

Background

In order to provide powerful support for commodity retrieval service, commodity placement strategy making service, commodity intelligent recommendation service and the like, accurate classification of commodities is often required.

In the conventional technology, in the process of classifying the commodities, the multi-mode information such as image information, text information and the like of the commodities in the description information is often simply fused to realize the classification of the commodities.

However, the conventional technology often does not consider that there is often much useless redundant information in the descriptive information of the target commodity, for example, there is too much useless content in the descriptive picture of the target commodity. This makes it impossible for the conventional technology to accurately classify object descriptions such as commodity descriptions.

Disclosure of Invention

The present disclosure provides an information processing method, apparatus, electronic device, and storage medium to at least solve the problem that the object description content cannot be accurately classified in the related art. The technical scheme of the present disclosure is as follows:

according to a first aspect of an embodiment of the present disclosure, there is provided an information processing method including:

acquiring image content characteristics and text content characteristics in object description contents;

determining target image content characteristics in the image content characteristics; the association degree between the object information characterized by the target image content characteristics and the object information characterized by the text content characteristics meets a preset condition;

and determining a target classification result of the object description content according to the target image content characteristics and the character content characteristics.

In one possible implementation manner, the determining the target image content feature in the image content features includes: determining the relevance of the image content according to the image content characteristics and the text content characteristics; the image content association degree is used for representing the association degree between the image description content and the text description content in the object description content; weighting the image content features based on the image content relevance; and taking the weighted image content characteristics as the target image content characteristics.

In one possible implementation manner, the weighting the image content features based on the relevance of the image content includes: determining a weight matrix corresponding to the relevance of the image-text content; multiplying the weight matrix with a matrix corresponding to the image content characteristics to obtain a multiplied matrix; and taking the multiplied matrix as the characteristic of the weighted image content.

In one possible implementation manner, the determining the target classification result of the object description content according to the target image content feature and the text content feature includes: fusing the target image content characteristics and the text content characteristics to obtain description content fusion characteristics; and determining a target classification result of the object descriptive content according to the descriptive content fusion characteristics.

In one possible implementation manner, the fusing the target image content feature and the text content feature to obtain a descriptive content fusion feature includes: performing dimension-increasing processing on the target image content characteristics to obtain processed image content characteristics; wherein the processed image content features are up-dimensional representations of the target image content features; performing dimension-increasing processing on the character content features to obtain processed character content features; wherein the processed text content features are up-dimensional representations of the text content features; and fusing the processed image content characteristics and the processed text content characteristics to obtain the description content fusion characteristics.

In one possible implementation manner, the fusing the processed image content feature and the processed text content feature to obtain the descriptive content fusion feature includes: multiplying the processed image content characteristics and the processed text content characteristics to obtain multiplied characteristics; performing dimension lifting processing on the multiplied features to obtain the description fusion features; the descriptive content fusion feature is an up-dimensional expression of the multiplied feature.

In one possible implementation manner, the determining, according to the description content fusion feature, a target classification result of the object description content includes: classifying the description content fusion characteristics to generate a plurality of candidate classification results of the object description content; each candidate classification result has a corresponding probability value; and taking the candidate classification result with the highest probability value as a target classification result of the object description content among the plurality of candidate classification results.

In one possible implementation manner, the acquiring the image content feature and the text content feature in the object description content includes: determining image description contents in the object description contents; performing feature extraction on the image description content through a trained image feature extraction model to obtain the image content features; and determining text description content in the object description content; and extracting the characteristics of the text description content through a trained text characteristic extraction model to obtain the characteristics of the text content.

According to a second aspect of the embodiments of the present disclosure, there is provided an information processing apparatus including:

an acquisition unit configured to perform acquisition of image content features and text content features in the object description content;

a determining unit configured to determine a target image content feature among the image content features; the association degree between the object information characterized by the target image content characteristics and the object information characterized by the text content characteristics meets a preset condition;

and the classification unit is configured to determine a target classification result of the object description content according to the target image content characteristics and the text content characteristics.

In a possible implementation manner, the determining unit is specifically configured to perform determining a degree of relevance of the content of the text according to the image content feature and the text content feature; the image content association degree is used for representing the association degree between the image description content and the text description content in the object description content; weighting the image content features based on the image content relevance; and taking the weighted image content characteristics as the target image content characteristics.

In one possible implementation manner, the determining unit is specifically configured to perform determining a weight matrix corresponding to the relevancy of the teletext content; multiplying the weight matrix with a matrix corresponding to the image content characteristics to obtain a multiplied matrix; and taking the multiplied matrix as the characteristic of the weighted image content.

In one possible implementation manner, the classifying unit is specifically configured to perform fusion of the target image content feature and the text content feature to obtain a descriptive content fusion feature; and determining a target classification result of the object descriptive content according to the descriptive content fusion characteristics.

In one possible implementation manner, the classifying unit is specifically configured to perform dimension-up processing on the target image content feature to obtain a processed image content feature; wherein the processed image content features are up-dimensional representations of the target image content features; performing dimension-increasing processing on the character content features to obtain processed character content features; wherein the processed text content features are up-dimensional representations of the text content features; and fusing the processed image content characteristics and the processed text content characteristics to obtain the description content fusion characteristics.

In one possible implementation manner, the classifying unit is specifically configured to perform multiplication operation on the processed image content features and the processed text content features to obtain multiplied features; performing dimension lifting processing on the multiplied features to obtain the description fusion features; the descriptive content fusion feature is an up-dimensional expression of the multiplied feature.

In one possible implementation manner, the classifying unit is specifically configured to perform classification processing on the description content fusion feature, and generate a plurality of candidate classification results of the object description content; each candidate classification result has a corresponding probability value; and taking the candidate classification result with the highest probability value as a target classification result of the object description content among the plurality of candidate classification results.

In a possible implementation manner, the acquiring unit is specifically configured to perform determining image description content in the object description content; performing feature extraction on the image description content through a trained image feature extraction model to obtain the image content features; and determining text description content in the object description content; and extracting the characteristics of the text description content through a trained text characteristic extraction model to obtain the characteristics of the text content.

According to a third aspect of embodiments of the present disclosure, there is provided an electronic device comprising a memory storing a computer program and a processor implementing the information processing method according to the first aspect or any one of the possible implementations of the first aspect when the processor executes the computer program.

According to a fourth aspect of embodiments of the present disclosure, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the information processing method according to the first aspect or any one of the possible implementations of the first aspect.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising a computer program stored in a readable storage medium, from which at least one processor of a device reads and executes the computer program, such that the device performs the information processing method of the first aspect or any one of the possible implementations of the first aspect.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects: determining target image content characteristics by acquiring image content characteristics and text content characteristics in object description contents and determining the target image content characteristics in the image content characteristics; the association degree between the object information represented by the object image content features and the object information represented by the text content features meets the preset condition, and then the object classification result of the object description content is determined according to the object image content features and the text content features, so that the characteristic with high matching degree of the represented object information and the object information represented by the text content features in the image content features can be endowed with higher weight, useless redundant features in the image content features are filtered out, the data processing amount in the process of classifying the object description content is reduced, and the object classification result of the object description content is accurately determined.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.

Fig. 1 is an application environment diagram illustrating an information processing method according to an exemplary embodiment.

Fig. 2 is a flowchart illustrating a method of information processing according to an exemplary embodiment.

Fig. 3 is a flowchart illustrating another information processing method according to an exemplary embodiment.

Fig. 4 is a block diagram of an information processing apparatus according to an exemplary embodiment.

Fig. 5 is a block diagram of an information processing model, according to an exemplary embodiment.

Fig. 6 is an internal structural diagram of an electronic device, which is shown according to an exemplary embodiment.

Fig. 7 is an internal structural diagram of an electronic device shown according to another exemplary embodiment.

Detailed Description

In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

The information processing method provided by the disclosure can be applied to an application environment as shown in fig. 1. Wherein the computer device 110 obtains image content features and text content features in the object description; then, the computer device 110 determines a target image content feature among the image content features; the association degree between the object information represented by the target image content features and the object information represented by the text content features meets preset conditions; finally, the computer device 110 determines a target classification result of the object description content based on the target image content features and the text content features. In practical applications, the computer device 110 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers and portable wearable devices, and of course, the computer device 110 may be implemented by a stand-alone server or a server cluster composed of a plurality of servers.

Fig. 2 is a flowchart illustrating an information processing method according to an exemplary embodiment, which is used in the computer device 110 of fig. 1, as shown in fig. 2, including the following steps.

In step S210, image content features and text content features in the object description content are acquired.

Wherein an object may refer to an object to be classified. For example, commodities to be classified, entities to be classified in the field of knowledge maps, and the like.

Wherein, the object description contents may refer to contents for describing detailed information of the object. For example, when the object to be classified is a commodity to be classified, the object description content may be commodity information uploaded by the user; the commodity information contains commodity pictures, commodity names, commodity description texts and other contents.

In the specific implementation, when the object to be classified is a commodity to be classified, the computer equipment acquires commodity information which is object description content and contains commodity pictures, commodity titles, commodity description texts and other contents and is uploaded by a user; the computer device may then perform data preprocessing on the object description to obtain preprocessed object description. And then, the computer equipment adopts a trained feature extraction model to extract image content features corresponding to the image description contents in the object description contents and extract text content features corresponding to the text description contents in the object description contents respectively.

In step S220, among the image content features, a target image content feature is determined.

The association degree between the object information characterized by the target image content features and the object information characterized by the text content features meets the preset condition.

In a specific implementation, after the computer device obtains the image content feature and the text content feature in the object description content, the computer device may determine, based on an attention (attention) mechanism, a target image content feature in the image content feature, so that a degree of association between object information represented by the target image content feature and object information represented by the text content feature satisfies a preset condition.

Specifically, the computer device may perform the attention weight calculation on the image content feature and the text content feature based on the attention mechanism, to obtain a weight matrix for characterizing the association relationship between the graphics and text. And then, the computer equipment adopts the weight matrix to weight the image content characteristics, namely, in the image content characteristics, the characteristic with high correlation degree of the represented object information and the object information represented by the character content characteristics is endowed with higher importance, the characteristic with low correlation degree of the represented object information and the object information represented by the character content characteristics is endowed with lower importance, and further, the weighted image content characteristics, namely, the target image content characteristics are obtained.

In step S230, a target classification result of the object description content is determined according to the target image content feature and the text content feature.

In a specific implementation, after the computer equipment determines the target image content characteristics in the image content characteristics, the computer equipment can perform classification processing by utilizing the target image content characteristics and the character content characteristics, so as to determine a target classification result of the object description content, namely, determine a classification result corresponding to the commodity to be classified.

Specifically, when the computer device performs classification processing by using the target image content features and the text content features, the computer device may perform a series of feature processing such as up-scaling expression and feature fusion on the target image content features and the text content features, to obtain feature processing results. Finally, the computer equipment carries out classification processing on the characteristic processing results, so as to determine the target classification result of the object description content.

In the information processing method, the image content characteristics and the text content characteristics in the object description content are acquired, and the target image content characteristics are determined in the image content characteristics; the association degree between the object information represented by the object image content features and the object information represented by the text content features meets the preset condition, and then the object classification result of the object description content is determined according to the object image content features and the text content features, so that the characteristic with high matching degree of the represented object information and the object information represented by the text content features in the image content features can be endowed with higher weight, useless redundant features in the image content features are filtered out, the data processing amount in the process of classifying the object description content is reduced, and the object classification result of the object description content is accurately determined.

In an exemplary embodiment, in the image content features, determining the target image content features includes: determining the relevance of the image content according to the image content characteristics and the text content characteristics; weighting image content characteristics based on the relevance of the image content; and taking the weighted image content characteristics as target image content characteristics.

The process of weighting the image content features based on the relevance of the image content comprises the following steps: determining a weight matrix corresponding to the relevance of the image-text content; multiplying the weight matrix with a matrix corresponding to the image content characteristics to obtain a multiplied matrix; the multiplied matrix is used as the characteristic of the weighted image content.

The image description content and the text description content are used for representing the association degree between the image description content and the text description content in the object description content. Specifically, the content association degree of the graph is used for representing the association degree of the object information described by each area of the image description content in the object description content and the object information described by the text description content.

In specific implementation, in the process of determining the target image content characteristics in the image content characteristics, the computer equipment specifically comprises the following steps: the computer equipment determines the relevance of the image content according to the image content characteristics and the text content characteristics; weighting image content characteristics based on the relevance of the image content; and taking the weighted image content characteristics as target image content characteristics. Specifically, the computer device may perform the attention mechanism model to perform the attention weight calculation on the image content feature and the text content feature, to obtain a weight matrix for characterizing the association relationship between the graphics and text, that is, the graphics and text content association degree. Then, the computer equipment adopts the image content relevance degree to weight the image content characteristics, and further obtains the weighted image content characteristics, namely the target image content characteristics.

Specifically, in the process of weighting the image content characteristics based on the relevance of the image content, the computer equipment determines a weight matrix corresponding to the relevance of the image content; multiplying the weight matrix with a matrix corresponding to the image content characteristics to obtain a multiplied matrix; the multiplied matrix is used as the characteristic of the weighted image content.

According to the technical scheme of the embodiment, the relevance of the image content is determined according to the image content characteristics and the text content characteristics; weighting image content characteristics based on the relevance of the image content; the method and the device are realized in the image content characteristics, and the target image content characteristics are accurately determined.

In an exemplary embodiment, determining a target classification result of the object description content according to the target image content feature and the text content feature comprises: fusing the target image content characteristics and the text content characteristics to obtain description content fusion characteristics; and determining a target classification result of the object description content according to the description content fusion characteristics.

In a specific implementation, in a process of determining a target classification result of object description content according to a target image content feature and a text content feature, the method specifically includes: the computer equipment can perform feature fusion on the content features of the target image and the character content features to obtain description content fusion features; and determining a target classification result of the object description content according to the description content fusion characteristics. In particular, the computer device may input the descriptive fusion feature to a trained classification model and determine a target classification result for the object descriptive based on an output result of the classification model.

The process of fusing the target image content characteristics and the text content characteristics to obtain the description content fusion characteristics comprises the following steps: performing dimension-increasing processing on the target image content characteristics to obtain processed image content characteristics; performing dimension-increasing processing on the character content characteristics to obtain processed character content characteristics; and fusing the processed image content characteristics and the processed text content characteristics to obtain descriptive content fusion characteristics.

The processed image content features are up-dimensional expressions of the target image content features.

The character content characteristics after processing are the up-dimensional expression of the character content characteristics.

In the specific implementation, in the process of fusing the target image content characteristics and the text content characteristics to obtain the description content fusion characteristics, the computer equipment respectively enters a full-connection layer (Fully Connection Layer) for the target image content characteristics and the text content characteristics, performs dimension-increasing processing on the target image content characteristics through the feature processing of the full-connection layer to obtain dimension-increasing expression of the target image content characteristics, namely, processed image content characteristics, and performs dimension-increasing processing on the text content characteristics to obtain dimension-increasing expression of the text content characteristics, namely, processed text content characteristics. It should be noted that, the up-dimension processing process for the target image content feature and the text content feature may also name a single feature independent up-dimension expression.

And finally, the computer equipment fuses the processed image content characteristics and the processed text content characteristics to obtain descriptive content fusion characteristics.

According to the technical scheme, the description content fusion characteristic is obtained by fusing the target image content characteristic and the text content characteristic; and determining a target classification result of the object description content according to the description content fusion characteristics, so as to accurately determine the target classification result of the object description content according to the target image content characteristics and the text content characteristics.

In an exemplary embodiment, fusing the processed image content features and the processed text content features to obtain descriptive content fusion features includes: performing an attribute mechanism multiplication operation on the processed image content characteristics and the processed text content characteristics to obtain multiplied characteristics; carrying out dimension lifting processing on the multiplied features to obtain descriptive content fusion features; wherein the descriptive content fusion feature is an up-dimensional representation of the multiplied feature.

Wherein the descriptive content fusion feature is an up-dimensional representation of the multiplied feature.

In a specific implementation, in a process of fusing the processed image content characteristics and the processed text content characteristics by the computer equipment to obtain the description content fusion characteristics, the method specifically comprises the following steps: performing an attribute mechanism multiplication operation on the processed image content features and the processed text content features to obtain multiplied features, wherein the multiplied features can be named as interaction features; and carrying out dimension-lifting processing on the multiplied features to obtain dimension-lifting expression of the multiplied features, namely describing the content fusion features. Specifically, the computer device may access the multiplied features into a full connection layer and a hidden layer, and finally obtain the description fusion feature. Of course. Descriptive content fusion features may also be named fusion information updimension tokens.

According to the technical scheme, multiplication operation of an attribute mechanism is carried out on the processed image content characteristics and the processed text content characteristics to obtain multiplied characteristics; carrying out dimension lifting processing on the multiplied features to obtain descriptive content fusion features; the descriptive content fusion feature is the ascending-dimension expression of the multiplied feature, so that the processed image content feature and the processed text content feature can be accurately fused to obtain the descriptive content fusion feature.

In an exemplary embodiment, determining a target classification result of the object description according to the description fusion feature includes: classifying the fusion characteristics of the descriptive contents to generate a plurality of candidate classification results of the descriptive contents of the object; each candidate classification result has a corresponding probability value; among the plurality of candidate classification results, the candidate classification result having the highest probability value is taken as the target classification result of the object description content.

Wherein each candidate classification result has a corresponding probability value.

In a specific implementation, the computer device determines a target classification result of the object description content according to the fusion characteristic of the description content, and specifically includes: the computer equipment performs classification processing on the fusion characteristics of the descriptive contents to generate a plurality of candidate classification results of the object descriptive contents; among the plurality of candidate classification results, the candidate classification result having the highest probability value is taken as the target classification result of the object description content.

Specifically, the computer device may input the processed description fusion feature to the full connection layer for classification processing, so as to obtain a full connection layer classification result. Then, converting the full-connection layer classification result into a plurality of candidate classification results through an activation function, such as a Softmax function; each candidate classification result has a corresponding probability value. Meanwhile, the sum of probability values of the respective candidate classification results is equal to 1. Finally, the computer device takes the candidate classification result with the highest probability value as a target classification result of the object description content, namely the final category of the object to be classified, from the plurality of candidate classification results.

According to the technical scheme of the embodiment, in the process of determining the target classification result of the object description content according to the description content fusion characteristics, a plurality of candidate classification results of the object description content are accurately generated by classifying the description content fusion characteristics; each candidate classification result has a corresponding probability value; and taking the candidate classification result with the highest probability value as a target classification result of the object description content among the candidate classification results.

In an exemplary embodiment, acquiring image content features and text content features in object descriptions includes: determining image description contents in the object description contents; performing feature extraction on the image description content through a trained image feature extraction model to obtain image content features; determining text description contents in the object description contents; and extracting the characteristics of the text description contents through a trained text characteristic extraction model to obtain the characteristics of the text contents.

In the specific implementation, in the process of acquiring the image content characteristics and the text content characteristics in the object description content, the computer equipment determines the image description content in the object description content; and then, carrying out feature extraction on the image description content through a trained image feature extraction model to obtain image content features. In particular, the computer device may also perform data preprocessing on the image description content. For example, the computer device may clip the description image of the object to be classified to a uniform format size, adjust the image size of the description image of the object to be classified to 224px x 224px, and set the number of channels to 3. Then, the computer device inputs the processed image data into a trained Resnet50 network model (an image feature extraction model) in the CV (computer vision) field, and feature extraction is performed on the image description content through the Resnet50 network model, so that image content features are obtained.

Specifically, the computer device determines a textual description content in the object description content; and then, extracting the characteristics of the text description content through a trained text characteristic extraction model to obtain the characteristics of the text content. Specifically, the computer device may also perform data preprocessing on the text description; for example, the computer device may further combine (concat) the commodity title information (title) and the commodity description information (description) together to obtain text description contents, perform word segmentation processing on the text description contents, and construct a dictionary to obtain processed text data. Specifically, the computer device pads each sample to a uniform size of 64 word lengths; the 128-dimensional emmbedding word vector is randomly generated for each word, and the subsequent penalty optimization process also updates the emmbedding here. Then, the computer device inputs the processed text data into a trained TextCNN network model (a text feature extraction model) in the field of NLP (natural language processing), and performs feature extraction on text description content through the TextCNN network model, thereby obtaining text content features.

In the technical scheme of the embodiment, in the process of acquiring the image content characteristics and the text content characteristics in the object description content, the image description content in the object description content is determined; the feature extraction is accurately carried out on the image description content through a trained image feature extraction model, so that image content features are obtained; meanwhile, determining text description contents in the object description contents; and accurately extracting the characteristics of the text description contents through the trained text characteristic extraction model to obtain the characteristics of the text contents.

Fig. 3 is a flowchart illustrating another information processing method according to an exemplary embodiment, which is used in the computer device 110 of fig. 1, as shown in fig. 3, including the following steps. In step S302, image content features and text content features in the object description content are acquired. In step S304, determining a degree of association of the image content according to the image content features and the text content features; the image content association degree is used for representing the association degree between the image description content and the text description content in the object description content. In step S306, a weight matrix corresponding to the degree of association of the teletext content is determined. In step S308, multiplying the weight matrix by a matrix corresponding to the image content feature to obtain the target image content feature; the association degree between the object information characterized by the target image content features and the object information characterized by the text content features meets the preset condition. In step S310, performing dimension-up processing on the target image content features to obtain processed image content features; the processed image content features are up-dimensional expressions of the target image content features. In step S312, performing dimension-up processing on the character content features to obtain processed character content features; wherein the processed literal content feature is an up-dimensional representation of the literal content feature. In step S314, a multiplication operation is performed on the processed image content feature and the processed text content feature, so as to obtain a multiplied feature. In step S316, the post-multiplication feature is subjected to dimension-lifting processing, so as to obtain the descriptive content fusion feature; the descriptive content fusion feature is an up-dimensional expression of the multiplied feature. In step S318, the description fusion feature is classified, so as to generate a plurality of candidate classification results of the object description; each candidate classification result has a corresponding probability value. In step S320, among the plurality of candidate classification results, a candidate classification result having the highest probability value is taken as a target classification result of the object description content. It should be noted that, the specific limitation of the above steps may be referred to the specific limitation of an information processing method, which is not described herein.

It should be understood that, although the steps in the flowcharts of fig. 2 and 3 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least a portion of the steps in fig. 2 and 3 may include a plurality of steps or stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the execution of the steps or stages is not necessarily sequential, but may be performed in rotation or alternatively with at least a portion of the steps or stages in other steps or steps.

Fig. 4 is a block diagram of an information processing apparatus according to an exemplary embodiment. Referring to fig. 4, the apparatus includes:

the acquiring unit 410 is configured to perform acquisition of image content features and text content features in the object description content;

the determining unit 420 is configured to determine a target image content feature among the image content features; the association degree between the object information characterized by the target image content characteristics and the object information characterized by the text content characteristics meets a preset condition;

The classification unit 430 is configured to perform a determination of a target classification result of the object description content based on the target image content feature and the text content feature.

In an exemplary embodiment, the determining unit 420 is specifically configured to perform determining a degree of relevance of the content of the text according to the image content feature and the text content feature; the image content association degree is used for representing the association degree between the image description content and the text description content in the object description content; weighting the image content features based on the image content relevance; and taking the weighted image content characteristics as the target image content characteristics.

In an exemplary embodiment, the determining unit 420 is specifically configured to perform determining a weight matrix corresponding to the relevance of the content; multiplying the weight matrix with a matrix corresponding to the image content characteristics to obtain a multiplied matrix; and taking the multiplied matrix as the characteristic of the weighted image content.

In an exemplary embodiment, the classifying unit 430 is specifically configured to perform fusion of the target image content feature and the text content feature to obtain a descriptive content fusion feature; and determining a target classification result of the object descriptive content according to the descriptive content fusion characteristics.

In an exemplary embodiment, the classifying unit 430 is specifically configured to perform an up-dimension processing on the target image content feature, to obtain a processed image content feature; wherein the processed image content features are up-dimensional representations of the target image content features; performing dimension-increasing processing on the character content features to obtain processed character content features; wherein the processed text content features are up-dimensional representations of the text content features; and fusing the processed image content characteristics and the processed text content characteristics to obtain the description content fusion characteristics.

In an exemplary embodiment, the classifying unit 430 is specifically configured to perform a multiplication operation on the processed image content feature and the processed text content feature, so as to obtain a multiplied feature; performing dimension lifting processing on the multiplied features to obtain the description fusion features; the descriptive content fusion feature is an up-dimensional expression of the multiplied feature.

In an exemplary embodiment, the classifying unit 430 is specifically configured to perform a classification process on the description content fusion feature, and generate a plurality of candidate classification results of the object description content; each candidate classification result has a corresponding probability value; and taking the candidate classification result with the highest probability value as a target classification result of the object description content among the plurality of candidate classification results.

In an exemplary embodiment, the obtaining unit 410 is specifically configured to perform determining an image description content in the object description content; performing feature extraction on the image description content through a trained image feature extraction model to obtain the image content features; and determining text description content in the object description content; and extracting the characteristics of the text description content through a trained text characteristic extraction model to obtain the characteristics of the text content.

FIG. 5 provides a block diagram of an information processing model, as shown in FIG. 5: in the specific implementation, in the process of acquiring the image content characteristics and the text content characteristics in the object description content, the computer equipment determines the image description content in the object description content; and then, carrying out feature extraction on the image description content through a trained image feature extraction model to obtain image content features. In particular, the computer device may also perform data preprocessing on the image description content. For example, the computer device may clip the description image of the object to be classified to a uniform format size, adjust the image size of the description image of the object to be classified to 224px x 224px, and set the number of channels to 3. Then, the computer device inputs the processed image data into a trained Resnet50 network model (an image feature extraction model) in the CV (computer vision) field, and feature extraction is performed on the image description content through the Resnet50 network model, so that image content features are obtained.

Then, determining target image content characteristics in the image content characteristics; the association degree between the object information characterized by the target image content characteristics and the object information characterized by the text content characteristics meets a preset condition; and determining a target classification result of the object description content according to the target image content characteristics and the character content characteristics. It should be noted that, the specific limitation of the above steps may be referred to the specific limitation of an information processing method, which is not described herein.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

Fig. 6 is a block diagram illustrating an apparatus 600 for performing an information processing method according to an exemplary embodiment. For example, device 600 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, exercise device, personal digital assistant, or the like.

Referring to fig. 6, device 600 may include one or more of the following components: a processing component 602, a memory 604, a power component 606, a multimedia component 608, an audio component 610, an input/output (I/O) interface 612, a sensor component 614, and a communication component 616.

The processing component 602 generally controls overall operation of the device 600, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 602 may include one or more processors 620 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 602 can include one or more modules that facilitate interaction between the processing component 602 and other components. For example, the processing component 602 may include a multimedia module to facilitate interaction between the multimedia component 608 and the processing component 602.

The memory 604 is configured to store various types of data to support operations at the device 600. Examples of such data include instructions for any application or method operating on device 600, contact data, phonebook data, messages, pictures, video, and the like. The memory 604 may be implemented by any type of volatile or non-volatile memory device or combination thereof, such as static random access memory (7 RAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power supply component 606 provides power to the various components of the device 600. The power components 606 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device 600.

The multimedia component 608 includes a screen between the device 600 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 608 includes a front camera and/or a rear camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 600 is in an operational mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 610 is configured to output and/or input audio signals. For example, the audio component 610 includes a Microphone (MIC) configured to receive external audio signals when the device 600 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 604 or transmitted via the communication component 616. In some embodiments, audio component 610 further includes a speaker for outputting audio signals.

The I/O interface 612 provides an interface between the processing component 602 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.

The sensor assembly 614 includes one or more sensors for providing status assessment of various aspects of the device 600. For example, the sensor assembly 614 may detect the on/off state of the device 600, the relative positioning of the components, such as the display and keypad of the device 600, the sensor assembly 614 may also detect a change in position of the device 600 or a component of the device 600, the presence or absence of user contact with the device 600, the orientation or acceleration/deceleration of the device 600, and a change in temperature of the device 600. The sensor assembly 614 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. The sensor assembly 614 may also include a light sensor, such as a CMO7 or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 614 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 616 is configured to facilitate communication between the device 600 and other devices, either wired or wireless. The device 600 may access a wireless network based on a communication standard, such as WiFi, an operator network (e.g., 2G, 3G, 4G, or 5G), or a combination thereof. In one exemplary embodiment, the communication component 616 receives broadcast signals or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 616 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 600 may be implemented by one or more application specific integrated circuits (A7 ICs), digital signal processors (D7P), digital signal processing devices (D7 PD), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors or other electronic components for performing the above method.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 604, including instructions executable by processor 620 of device 600 to perform the above-described method. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

Fig. 7 is a block diagram illustrating an apparatus 700 for performing an information processing method according to another exemplary embodiment. For example, device 700 may be a server. Referring to fig. 7, the device 700 includes a processing component 720 that further includes one or more processors, and memory resources represented by a memory 722 for storing instructions, such as applications, executable by the processing component 720. The application program stored in memory 722 may include one or more modules that each correspond to a set of instructions. Further, the processing component 720 is configured to execute instructions to perform the information processing methods described above.

The device 700 may also include a power component 724 configured to perform power management of the device 700, a wired or wireless network interface 726 configured to connect the device 700 to a network, and an input output (I/O) interface 728. The device 700 may operate based on an operating system stored in memory 722, such as Window7 7erverTM,Mac O7 XTM,UnixTM,LinuxTM,FreeB7DTM or the like.

In an exemplary embodiment, a storage medium is also provided, such as a memory 722, including instructions executable by a processor of the device 700 to perform the above-described method. The storage medium may be a non-transitory computer readable storage medium, which may be, for example, ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An information processing method, characterized in that the method comprises:

determining target image content characteristics in the image content characteristics; the target image content features are multiplied matrixes obtained by multiplying matrixes corresponding to the image content features by adopting weight matrixes corresponding to the image content relevance; the image-text content association degree is determined according to the image content characteristics and the text content characteristics and is used for representing the association degree between the image description content and the text description content in the object description content;

2. The information processing method according to claim 1, wherein the determining a target image content feature among the image content features includes:

determining the relevance of the image content according to the image content characteristics and the text content characteristics; the image content association degree is used for representing the association degree between the image description content and the text description content in the object description content;

weighting the image content features based on the image content relevance;

and taking the weighted image content characteristics as the target image content characteristics.

3. The information processing method according to claim 2, wherein the weighting the image content features based on the degree of association of the image content includes:

determining a weight matrix corresponding to the relevance of the image-text content;

multiplying the weight matrix with a matrix corresponding to the image content characteristics to obtain a multiplied matrix;

and taking the multiplied matrix as the characteristic of the weighted image content.

4. An information processing method according to claim 1 or 3, wherein said determining a target classification result of the object description content based on the target image content feature and the text content feature comprises:

fusing the target image content characteristics and the text content characteristics to obtain description content fusion characteristics;

and determining a target classification result of the object descriptive content according to the descriptive content fusion characteristics.

5. The information processing method according to claim 4, wherein the fusing the target image content feature and the text content feature to obtain a descriptive content fusion feature includes:

performing dimension-increasing processing on the target image content characteristics to obtain processed image content characteristics; wherein the processed image content features are up-dimensional representations of the target image content features;

performing dimension-increasing processing on the character content features to obtain processed character content features; wherein the processed text content features are up-dimensional representations of the text content features;

and fusing the processed image content characteristics and the processed text content characteristics to obtain the description content fusion characteristics.

6. The information processing method according to claim 5, wherein the fusing the processed image content feature and the processed text content feature to obtain the descriptive content fusion feature includes:

multiplying the processed image content characteristics and the processed text content characteristics to obtain multiplied characteristics;

performing dimension lifting processing on the multiplied features to obtain the description fusion features; the descriptive content fusion feature is an up-dimensional expression of the multiplied feature.

7. The information processing method according to claim 4, wherein the determining the target classification result of the object description content based on the description content fusion feature includes:

classifying the description content fusion characteristics to generate a plurality of candidate classification results of the object description content; each candidate classification result has a corresponding probability value;

and taking the candidate classification result with the highest probability value as a target classification result of the object description content among the plurality of candidate classification results.

8. The information processing method according to claim 1, wherein the acquiring the image content feature and the text content feature in the object description content includes:

Determining image description contents in the object description contents;

performing feature extraction on the image description content through a trained image feature extraction model to obtain the image content features;

and, a step of, in the first embodiment,

determining text description contents in the object description contents;

and extracting the characteristics of the text description content through a trained text characteristic extraction model to obtain the characteristics of the text content.

9. An information processing apparatus, characterized by comprising:

a determining unit configured to determine a target image content feature among the image content features; the target image content features are multiplied matrixes obtained by multiplying matrixes corresponding to the image content features by adopting weight matrixes corresponding to the image content relevance; the image-text content association degree is determined according to the image content characteristics and the text content characteristics and is used for representing the association degree between the image description content and the text description content in the object description content;

10. The information processing apparatus according to claim 9, wherein the determination unit is specifically configured to perform determination of a degree of relevance of a content of a text based on the image content feature and the text content feature; the image content association degree is used for representing the association degree between the image description content and the text description content in the object description content; weighting the image content features based on the image content relevance; and taking the weighted image content characteristics as the target image content characteristics.

11. The information processing apparatus according to claim 10, wherein the determination unit is specifically configured to perform determination of a weight matrix corresponding to the degree of association of the teletext content; multiplying the weight matrix with a matrix corresponding to the image content characteristics to obtain a multiplied matrix; and taking the multiplied matrix as the characteristic of the weighted image content.

12. The information processing apparatus according to claim 9 or 11, wherein the classification unit is specifically configured to perform fusion of the target image content feature and the text content feature to obtain a descriptive content fusion feature; and determining a target classification result of the object descriptive content according to the descriptive content fusion characteristics.

13. The information processing apparatus according to claim 12, wherein the classification unit is specifically configured to perform an up-scaling process on the target image content feature to obtain a processed image content feature; wherein the processed image content features are up-dimensional representations of the target image content features; performing dimension-increasing processing on the character content features to obtain processed character content features; wherein the processed text content features are up-dimensional representations of the text content features; and fusing the processed image content characteristics and the processed text content characteristics to obtain the description content fusion characteristics.

14. The information processing apparatus according to claim 13, wherein the classification unit is specifically configured to perform a multiplication operation on the processed image content feature and the processed text content feature to obtain a multiplied feature; performing dimension lifting processing on the multiplied features to obtain the description fusion features; the descriptive content fusion feature is an up-dimensional expression of the multiplied feature.

15. The information processing apparatus according to claim 12, wherein the classification unit is specifically configured to perform classification processing on the description content fusion feature, generating a plurality of candidate classification results of the object description content; each candidate classification result has a corresponding probability value; and taking the candidate classification result with the highest probability value as a target classification result of the object description content among the plurality of candidate classification results.

16. The information processing apparatus according to claim 9, wherein the acquisition unit is specifically configured to perform determination of image descriptive contents among the object descriptive contents; performing feature extraction on the image description content through a trained image feature extraction model to obtain the image content features; and determining text description content in the object description content; and extracting the characteristics of the text description content through a trained text characteristic extraction model to obtain the characteristics of the text content.

17. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the information processing method of any one of claims 1 to 8.

18. A storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the information processing method of any one of claims 1 to 8.