CN117009508A

CN117009508A - Training method, device, equipment and storage medium for category identification model

Info

Publication number: CN117009508A
Application number: CN202211364149.9A
Authority: CN
Inventors: 康战辉
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-11-02
Filing date: 2022-11-02
Publication date: 2023-11-07

Abstract

The application discloses a training method, device and equipment for a category identification model and a storage medium, and relates to the technical field of artificial intelligence. The method comprises the following steps: obtaining a first sample set, wherein the first sample set comprises at least one first sample, and the first sample comprises a text pair of a first commodity name and first category information; training the first model by adopting a first sample set to obtain a trained first model; the first model is used for determining the correlation between commodity names and category information; screening the second sample set by using the trained first model to determine at least one available sample; wherein the second sample set includes at least one second sample including a second commodity name and a text pair of second category information; training the second model based on the available samples to obtain a trained second model; the second model is used for determining category information corresponding to the commodity name to be identified. The method is beneficial to improving accuracy of category information.

Description

Training method, device, equipment and storage medium for category identification model

Technical Field

The embodiment of the application relates to the technical field of artificial intelligence, in particular to a training method, device and equipment of a category identification model and a storage medium.

Background

In order to facilitate management of commodities, commodity name and category information are usually set for the commodities, and products with similar types can be quickly queried by using the category information.

In the related art, commodity name and category information of a commodity are provided by a manufacturer who puts the commodity on shelf, and the category information of the commodity is manually judged and determined. And the staff determines category information corresponding to the commodity according to subjective understanding of the commodity category.

However, relying on manually determined category information results in less accurate merchandise category information.

Disclosure of Invention

The embodiment of the application provides a training method, device, equipment and storage medium of a category identification model. The technical scheme is as follows:

according to an aspect of an embodiment of the present application, there is provided a training method of a category identification model, the method including:

obtaining a first set of samples, the first set of samples comprising at least one first sample comprising a first commodity name and a text pair of first category information;

Training the first model by adopting the first sample set to obtain a trained first model; wherein the first model is used for determining the correlation between commodity names and category information;

screening the second sample set by using the trained first model to determine at least one available sample; wherein the second sample set includes at least one second sample including a text pair of a second commodity name and second category information;

training the second model based on the available samples to obtain a trained second model; the second model is used for determining category information corresponding to the commodity name to be identified.

According to an aspect of an embodiment of the present application, there is provided a training apparatus for a category identification model, the apparatus including:

a sample acquisition module for acquiring a first sample set, the first sample set including at least one first sample, the first sample including a first commodity name and a text pair of first category information;

the first training module is used for training the first model by adopting the first sample set to obtain a trained first model; wherein the first model is used for determining the correlation between commodity names and category information;

The sample screening module is used for screening the second sample set by using the trained first model and determining at least one available sample; wherein the second sample set includes at least one second sample including a text pair of a second commodity name and second category information;

the second training module is used for training the second model based on the available samples to obtain a trained second model; the second model is used for determining category information corresponding to the commodity name to be identified.

According to an aspect of an embodiment of the present application, there is provided a computer device including a processor and a memory, the memory having stored therein a computer program that is loaded and executed by the processor to implement the above-described method.

According to an aspect of an embodiment of the present application, there is provided a computer-readable storage medium having stored therein a computer program loaded and executed by a processor to implement the above-described method.

According to an aspect of an embodiment of the present application, there is provided a computer program product comprising a computer program stored in a computer readable storage medium. The processor of the computer device reads the computer program from the computer-readable storage medium, and the processor executes the computer program so that the computer device performs the above-described method.

The technical scheme provided by the embodiment of the application can have the following beneficial effects: according to the method, the sample screening model is trained firstly, then the candidate samples are screened through the sample screening model to obtain at least one available sample, and the category identification model is trained according to the available sample, so that on one hand, the difficulty of collecting training samples for training the category identification model is reduced, on the other hand, the trained category identification model can determine category information corresponding to the commodity name to be identified, labor cost consumed in the process of determining the category information of the commodity is reduced, and accuracy of the category information corresponding to the commodity name is improved.

Drawings

FIG. 1 is a schematic illustration of an implementation environment for an embodiment of the present application;

FIG. 2 is a schematic diagram of a merchandise search process provided by an exemplary embodiment of the present application;

FIG. 3 is a schematic diagram of a search page for merchandise provided by an exemplary embodiment of the present application;

FIG. 4 is a flowchart of a training method for a category identification model provided by an exemplary embodiment of the present application;

FIG. 5 is a schematic diagram of a correlation prediction result determination process provided by an exemplary embodiment of the present application;

FIG. 6 is a schematic diagram of a second model operation method provided by an exemplary embodiment of the present application;

FIG. 7 is a schematic diagram of a twin network architecture provided by an exemplary embodiment of the present application;

FIG. 8 is a schematic diagram of a loss function provided by an exemplary embodiment of the present application;

FIG. 9 is a block diagram of a training apparatus for a category identification model provided by one embodiment of the present application;

fig. 10 is a block diagram of a computer device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.

Before describing embodiments of the present application, in order to facilitate understanding of the present solution, terms appearing in the present solution are explained below.

1. Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

2. The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

3. Natural language processing (Nature Language processing, NLP) is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.

4. Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, inductive learning type teaching learning, and the like.

5. Small sample learning (Few Shot Learning, FSL) is a training method of machine learning models. The model convergence effect is achieved by training the machine learning model by using a small amount of labeling samples (support set). The FSL method is expected to be independent of large-scale standard samples, thereby avoiding the high costs of labeling samples.

6. A twin Network (SN) is a coupling framework established based on two artificial neural networks. The twin neural Network has two inputs (Input 1 and Input 2), and Input1 and Input2 are respectively Input into two artificial neural networks (Network 1 and Network 2), so as to obtain two outputs mapped in the new space. A Loss (Loss) calculation is performed from the two outputs to determine the similarity between the two inputs.

With research and progress of artificial intelligence technology, research and application of artificial intelligence technology are developed in various fields, such as common virtual assistants, intelligent sound boxes, intelligent marketing, intelligent customer service and the like, and it is believed that with the development of technology, artificial intelligence technology will be applied in more fields and become more and more valuable.

The scheme provided by the embodiment of the application relates to the field of intelligent category determination, and category information corresponding to commodity names can be determined through training, so that the accuracy of the determined category information is improved.

Referring to fig. 1, a schematic diagram of an implementation environment of an embodiment of the present application is shown. The implementation environment of the scheme can comprise: a terminal device 10 and a server 20.

The terminal device 10 includes, but is not limited to, a mobile phone, a tablet computer, an intelligent voice interaction device, a game console, a wearable device, a multimedia playing device, a personal computer (Personal Computer, PC), a vehicle-mounted terminal, an intelligent home appliance, and the like. A client of a target application can be installed in the terminal device 10.

In the embodiment of the present application, the target application may be any application that can provide a commodity shelf or sales function. Typically, the application is a shopping-type application. Such applications provide goods shelves, goods retrieval, goods purchase, and the like. Of course, other types of applications besides shopping applications may provide commodity-related services, such as music applications, social applications, interactive entertainment applications, browser applications, content sharing applications, virtual Reality (VR) applications, augmented Reality (Augmented Reality, AR) applications, and the like, which are not limited in this embodiment of the present application. In addition, for different application programs, the names of the processed commodities may be different, and the corresponding functions may also be different, which may be configured in advance according to actual requirements, which is not limited by the embodiment of the present application. The target application program can be an independent application program, an extension program for realizing the category identification function in a certain application program, or a webpage in a browser.

Optionally, a client running the above application in the terminal device 10.

The server 20 is used to provide background services for clients of target applications in the terminal device 10. For example, the server 20 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms, but is not limited thereto.

The server 20 has at least data receiving and processing capabilities such that the terminal device 10 and the server 20 can communicate with each other via a network. The network may be a wired network or a wireless network. The server 20 receives the commodity name to be identified sent by the terminal device 10, and processes the commodity name to obtain category information corresponding to the commodity name.

In one example, the training process of the category identification model is performed on a computer device, that is, the method provided by the present application, and the execution subject of each step may be a computer device, which may be any electronic device with data storage and processing capabilities. For example, the computer device may be the server 20 in fig. 1, the terminal device 10 in fig. 1, or another device other than the terminal device 10 and the server 20.

In one example, the terminal device 10 acquires the commodity name to be identified, and transmits the commodity name to be identified to the server 20, and the server 20 determines category information corresponding to the commodity name to be identified through the trained category identification model.

In one example, in the training process of the category identification model, the data uploaded to the server 20 by the terminal device 10 is used as a training sample in the training process of the category identification model, so that the trained category identification model can better process the commodity name to be identified. Optionally, the trained category identification model can determine the similarity between the commodity name to be identified and at least one comparable commodity name, and the computer equipment uses the known category information corresponding to the comparable commodity name with the highest similarity as the category information corresponding to the commodity name to be identified, so that the category information corresponding to the commodity name to be identified is determined.

Fig. 2 is a schematic diagram of a commodity searching process provided by an exemplary embodiment of the present application.

Category information corresponding to the commodity name can be used for a commodity recall scene. The category information corresponding to the commodity name to be identified can be determined according to the commodity name to be identified through the trained category identification model, and the category information is used for representing the type of the newly-placed commodity.

In the process of searching for goods, as shown in fig. 2, a user inputs a search word, and determines at least one category information corresponding to the search word and probabilities corresponding to the category information respectively by analyzing the search word. The probability of category information is used to characterize the likelihood that the product to which the search term corresponds belongs to that type.

And carrying out commodity index by searching the corresponding at least one category information of the words to obtain at least one recall commodity. And in the commodity index process, determining at least one recall commodity by comparing the similarity between the category information corresponding to the search word and the category information corresponding to at least one candidate commodity. Optionally, at least one recall item may be determined from high to low according to a probability corresponding to category information of the search term.

In order to improve accuracy of the recall products, at least one recall product can be subjected to product sorting, recall products with high correlation with search terms are screened out, and the recall products are displayed to a user.

FIG. 3 is a schematic diagram of a search page for merchandise provided in an exemplary embodiment of the present application.

The user inputs search words in the search bar, the server searches according to the search words, determines at least one recalled commodity, sends the related information of the recalled commodity to the terminal equipment, and the terminal equipment displays at least one queried recalled commodity in the recalled commodity display area according to the related information of the recalled commodity.

With the continuous development of smart retail business, there is an increasing need to conduct classification for improving online commodity retail, such as setting accurate category labels for commodities. At present, the acquisition of category information of commodities mainly depends on a following mode.

First, the information of category submitted manually is relied on when the commercial tenant or branding party uploads the commodity.

The following two disadvantages exist when the category information corresponding to the commodity name is manually marked. On the one hand, the quality of the manually uploaded category information is uneven, and the situations that the accuracy of category information corresponding to the commodity name is not high and even the category information is wrong are easy to occur.

In addition, as the size of the commodity accessed by the target application program is continuously increased, the working cost of manually typing the list category information is continuously increased. Meanwhile, defects caused by low category information accuracy are increasingly displayed. Table 1 shows a common manual marking error condition.

TABLE 1

And secondly, using a search term prediction model to offline predict the category information corresponding to the commodity name.

Based on the existing search term category prediction model, category information corresponding to commodity names is marked off-line, and the problems that the accuracy of the category information determined through the search term category prediction model is low, the applicable scene is limited and the like are caused by inconsistent search terms and commodity names (relatively simple search terms and more abundant information in commodity names). Table 2 shows the difference of category information corresponding to the search term and the commodity name, respectively.

TABLE 2

In addition, if category information corresponding to commodity names is trained by adopting a similar search term category prediction model, the problem that the size of a required labeling sample is large, the labeling difficulty is high and the cost is high due to the fact that category layers are deep exists. In order to solve the problems, the application provides a training method of category identification models, which is used for improving the accuracy of category information corresponding to the determined commodity names.

Referring to FIG. 4, a flowchart of a training method for a category identification model according to one embodiment of the present application is shown. The main execution body of each step of the method may be the terminal device 10 in the implementation environment of the scheme shown in fig. 1, or may be the server 20 in the implementation environment of the scheme shown in fig. 1. In the following method embodiments, for convenience of description, only the execution subject of each step is described as "computer device". The method may comprise at least one of the following steps (410-440):

step 410, a first set of samples is obtained, the first set of samples comprising at least one first sample, the first sample comprising a first commodity name and a text pair of first category information.

The first commodity refers to any one commodity. For example, in the case where the target application belongs to a shopping class application, the first commodity may be any one of the items to be sold. For another example, the target application belongs to a music-based application, and the first product may be a piece of music. The type of the first commodity is set according to actual needs, and the present application is not limited thereto.

The commodity name is used to characterize attribute information of the commodity. Optionally, the attribute information of the commodity includes, but is not limited to, a composition, use, source, specification, method of use, etc. of the first commodity. In some embodiments, the commodity name includes at least one keyword for characterizing the commodity. For example, the first trade name may be: xx (usage keyword) yy (specification keyword) zz (brand keyword) skin care product.

In some embodiments, the merchandise name is provided by the merchant or branding party of the merchandise. Specific information carried in the commodity name is set according to the actual condition of the commodity, and the specific information is not set here.

The category information is used for representing the category to which the commodity belongs. In some embodiments, the commodities are classified and managed through category information, so that a commodity distribution condition of a certain type can be conveniently searched or counted.

In some embodiments, the category information includes a plurality of sub-category information, and a hierarchical relationship exists between different sub-category information, that is, sub-category information at an upper layer includes sub-category information at a lower layer. For example, the "food and beverage" sub-category information includes: "food 1" subcategory information, "food 2" subcategory information, "beverage 1" subcategory information, and "beverage 2" subcategory information, and the like.

For example, a certain tea corresponds to four sub-category information, respectively: the first sub-category information of 'food beverage', the second sub-category information of 'tea', the third sub-category information of 'green tea' and the fourth sub-category information of 'Longjing', the range of the first sub-category information is larger than that of the second sub-category information, the range of the second sub-category information is larger than that of the third sub-category information, and the range of the third sub-category information is larger than that of the fourth sub-category information.

In some embodiments, the sub-category information in different levels is preset, and the sub-category information corresponding to the commodity in a certain category level is selected from a plurality of preset sub-category information manually. Optionally, the category information corresponding to the commodity name is provided by the brand side of the commodity, or provided through other websites.

The hierarchical relationship in the category information and the naming method of the category information are set according to actual needs, and the present application is not limited thereto.

The first set of samples is used to train a first model, the first set of samples including at least one first sample. Optionally, the first set of samples comprises a plurality of first samples.

For any one of the first samples in the first sample set, the first sample includes a first commodity name and first category information. Alternatively, the first sample may be expressed as (first commodity name, first category information). After the computer device obtains the first sample, the first commodity name and the first category information can be read from the first sample.

In some embodiments, the first sample includes the following two classes: a first positive sample and a first negative sample; wherein the first positive sample includes a first commodity name and first category information corresponding to the same commodity. The first negative example includes a first merchandise name and a first category information pair corresponding to different merchandise.

Step 420, training the first model by using the first sample set to obtain a trained first model; wherein the first model is used for determining the correlation between commodity names and category information. In some embodiments, the first model is referred to as a training sample screening model, and the training samples used to train the second model are screened by the trained first model pair.

The first model is used to determine whether there is a correlation between the commodity name and the category information. The correlation between the commodity name and the category information means that the category information can correctly represent the type of the commodity corresponding to the commodity name.

For example, if the commodity name a is the name of the commodity a, if the category information a can characterize the type of the commodity a, the commodity name a and the category information a are described as being related. For another example, if the commodity name B is the name of the commodity B, if the category information B can characterize the type of the commodity C (the commodity C and the commodity B are different and belong to different categories), it is explained that the commodity name and the category information are not related.

If the commodity corresponding to the commodity name belongs to the category indicated by the category information, the commodity name is related to the category information, and if the commodity corresponding to the commodity name does not belong to the category indicated by the category information, the commodity name is not related to the category information.

For example, a text pair is: (xx Mobile phone, electronic product/Mobile terminal) since "electronic product/Mobile terminal" can characterize the merchandise category to which the Mobile phone belongs, the merchandise name and category information included in the text pair are described as being related.

For another example, a text pair is: (xx skin cleansing liquid, medical health/medicinal material name) since "medical health/medicinal material name" cannot characterize the type of skin care product to which the skin cleansing liquid belongs, it is stated that the trade name and category information included in this text pair are irrelevant.

In some embodiments, the first model refers to a machine learning model. Optionally, an Encoder (Encoder) and a Classifier (Classifier) are included in the first model; the encoder is used for extracting characteristic information in the text, and the classifier is used for determining correlation between commodity names and classification information based on the characteristic information.

In some embodiments, the encoder is constructed based on a network structure such as a recurrent neural network (Recurrent Neural Networks, RNNs), convolutional neural network (Convolutional Neural Networks, CNNs), recurrent neural network (Recursive Neural Networks), or the like. Optionally, attention mechanisms are introduced in the encoder to enhance the encoder's ability to understand semantic information.

The classifier can perform two classifications for the characteristic information of the text pairs, and the classifier outputs a model prediction result based on the first model output.

If the classifier judges that the commodity name and the category information are related, the model prediction result is used for representing that the commodity name and the category information are related, and if the classifier judges that the commodity name and the category information are not related, the model prediction result is used for representing that the commodity name and the category information are not related.

In some embodiments, the computer device trains the first model using at least one first sample of the first set of samples, resulting in a trained first model. Optionally, the training process of the first model belongs to supervised training. For details of this process, please refer to the following examples.

The first sample set includes a first sample, optionally, a first commodity name and first category information included in the first sample is obtained by the computer from other applications. Alternatively, the other application refers to an application having a similar function as the target application.

For example, the computer device obtains a plurality of commodity names and category information corresponding to the commodity names from other application programs through a crawler, and constructs a first sample set according to the plurality of captured commodity names and the category information corresponding to the commodity names. Alternatively, the obtained commodity name and the category information corresponding to the commodity name are grasped as a reference sample pair. For details of this process, please refer to the following examples.

By the method, a large amount of commodity names and category information corresponding to the commodity names can be obtained, and the construction difficulty of the first sample set can be reduced.

Step 430, screening the second sample set by using the trained first model to determine at least one available sample; wherein the second set of samples includes at least one second sample including a second commodity name and a text pair of second category information.

The available samples refer to second samples selected from the second sample set by the trained first model. Optionally, the available samples include a higher correlation between the commodity name and the category information than the other unselected second samples include the first commodity name and the second category information.

In some embodiments, the second sample in the second sample set and the first sample in the first sample set are of different data sources. Optionally, the second sample in the second sample set is merchandise name and category information collected by the target application and uploaded by the merchant or branding party. And the first sample in the first sample set is constructed based on the commodity name and category information acquired from the other application programs, as described above. For details of this process, please refer to the following examples.

In some embodiments, the computer device screens the second sample pair in the second sample set for at least one available sample by the trained first base model; the computer device generates training samples for training the second model based on the available samples.

Through the first model after training, candidate samples are screened, so that available samples are obtained, correlation between commodity names and category information input by merchants or brands is automatically judged, manual screening is not needed, labor cost required to be consumed for constructing training samples of the second model is reduced, and efficiency of screening the available samples is improved.

Step 440, training the second model based on the available samples to obtain a trained second model; the second model is used for determining category information corresponding to the commodity name to be identified.

In some embodiments, the second model refers to a machine learning model. The second model is used to determine the similarity between two commodity names. Alternatively, the second model is referred to as a category identification model.

Optionally, the second model includes a first network and a second network, and at least one identical parameter exists between the first network and the second network. For example, the second model belongs to a twin network, the first network and the second network are identical in structure, and the inputs of the first network and the second network are different. For details of this process, reference is made to the following examples.

In some embodiments, the computer device trains the second model using a small sample training method. By the training method, the training speed of the second model can be increased, and the number of training samples required in the training process of the second model is reduced.

In some embodiments, the trained second model compares the commodity name to be identified with at least one comparable commodity name, determines a comparable commodity name with highest similarity to the commodity name to be identified, and determines category information corresponding to the comparable commodity name as category information corresponding to the commodity name to be identified, wherein the comparable commodity name and the category information corresponding to the comparable commodity name are related.

In summary, through the method, the sample screening model is trained first, then the sample screening model screens the candidate samples to obtain at least one available sample, and the category identification model is trained according to the available sample, so that on one hand, the difficulty of collecting training samples for training the category identification model is reduced, on the other hand, the trained category identification model can determine category information corresponding to the commodity name to be identified, the labor cost consumed in the process of determining the category information of the commodity is reduced, and the accuracy in the process of determining the category information is improved.

The method of screening for available samples is described in several examples.

In some embodiments, the computer device screens the second sample set using the trained first model to determine at least one available sample, comprising: for any of the second samples, the computer device determines a correlation prediction result for the second sample using the trained first model; wherein the correlation prediction result is used for representing the correlation between the second commodity name and the second category information contained in the second sample; the computer device determines at least one second sample for which the correlation prediction result satisfies the screening condition as at least one available sample.

In some embodiments, the computer device inputs the second sample into the trained first model, obtaining a correlation prediction result for the second sample.

In some embodiments, the computer device determines a correlation prediction result for the second sample using the trained first model, comprising: the computer uses a word segmentation device to segment the second commodity name and the second category information, and the segmented second commodity name and the segmented second category information form a mark list; determining input features corresponding to the second sample through an embedding layer of the trained first model according to the mark list, processing the input features through an encoder in the trained first model to obtain fusion coding features, and processing the fusion coding features through a classifier in the trained first model to obtain a correlation prediction result aiming at the second sample.

The word segmenter is used to divide text into separate words. The independent word refers to the smallest unit of text that cannot be further divided. The computer equipment divides the second commodity name and the second category information by using a word segmentation device to obtain the segmented second commodity name and the segmented second category information.

In some embodiments, the segmented second merchandise name is arranged before the segmented second category information in the tag list. In other embodiments, the segmented second merchandise name is arranged after the segmented second category information in the tag list. In the word segmentation list, the arrangement sequence of the second commodity name and the second category information is not limited.

In some embodiments, the embedding layer of the first model is used to convert text pairs in the first sample into vector form. Optionally, the embedded layer of the first model participates in the training process of the first model, that is, the embedded layer changes in the training process of the first model, or the embedded layer of the first model does not participate in the training process of the first model, which is not limited by the present application.

Taking the example that the first model is built based on the BERT (Bidirectional Encoder Representation from Transformers, bi-directional encoder representation from convertors) model, the embedding layer of the first model comprises: a mark embedding layer, a segment embedding layer and a position embedding layer. The mark embedding layer is used for determining text features corresponding to the mark list, wherein the text features comprise text feature vectors corresponding to any independent word in the second sample.

Optionally, the tag list includes a spacer "[ SEP ]" and a semantic insert "[ CLS ]". The computer equipment sets a spacer [ SEP ] at the end of the sentence of the second commodity name after word segmentation, sets semantic embedding [ CLS ] at the head of the mark list and is used for representing semantic information of the second sample, and a mark list is obtained.

The segment embedding layer is used for generating corresponding segment features for any independent word in the tag list. Wherein the segment features are used for whether the independent word belongs to the second commodity name or the second category information.

The position embedding layer is used for determining the corresponding position characteristics of the mark list by using any independent word; wherein the position is also being used to characterize the position information of the individual words for the tag list.

Input feature=text feature+fragment feature+position feature.

The computer equipment transmits the input characteristics into the trained encoder of the first model, and the fusion coding characteristics corresponding to the second sample are determined through the encoder.

Alternatively, the process of determining the fusion encoded features of the text pairs in the second sample may be referred to as single-tower interactions. The fusion coding feature is used to characterize a multidimensional classification vector of the second commodity name and the second category information similarity. For example, the fusion encoding feature is a 256-dimensional classification vector, i.e., [ CLS ]: t1, T2, T3, … …, tn, n=256.

Optionally, the encoder includes at least one coding layer therein. For example, the encoder includes 12 encoding layers.

Any of the coding layers, including a multi-head Attention mechanism (Muti-head Attention) layer and a Feed Forward (Feed Forward) layer. Alternatively, the encoder used herein belongs to the BERT model, which may be referred to as a standard (base) BERT model. The standard BERT model includes: 12 coding layers, 12 attention headers and 768 concealment parameters.

It should be noted that the encoder may be constructed based on any model for feature encoding, and the present application is not limited to the type of encoder.

And fusing the coding features as input of a classifier in the trained first model to obtain correlation prediction information output by the classifier. Optionally, the classifier is a Softmax-based classifier. For details of the correlation prediction information, please refer to the above description, and details are not repeated here.

The computer device screens the second sample set for available samples based on the screening conditions and the correlation prediction results. The screening conditions are used to characterize conditions for screening the available samples from the second sample set. Optionally, the screening condition is related to a correlation between the second merchandise name and the second category information.

For example, the screening condition is used to indicate that a second sample with a correlation prediction result greater than or equal to the correlation threshold is selected as the available sample. Wherein the correlation threshold is preset, e.g. the correlation threshold is equal to 0.7. The correlation threshold is set according to actual needs, and the present application is not limited thereto.

For another example, the screening condition is used to indicate that the first n second samples with highest correlation are selected from the second sample set as available samples, and n is a positive integer.

Optionally, the trained first model includes a two-class classifier, and the correlation prediction result output by the classifier includes: correlation and uncorrelation. The screening condition is used to indicate a second sample for which the correlation prediction result is correlated as an optional sample. The computer device determines the second sample satisfying the screening condition as an available sample, that is, the commodity name and the category information are also included in the available sample, and there is a correlation between the commodity name and the category information.

The computer device processes the selectable samples, constructs a third sample set for training the second model, and trains the second model using the third sample set. For details of this process, reference is made to the following examples.

Fig. 5 is a schematic diagram of a correlation prediction result determination process according to an exemplary embodiment of the present application.

For any one second sample, firstly, the second commodity name and the second category information in the second sample are required to be segmented, a [ CLS ] is added to the text header, a [ SEP ] spacer is added to the position of the sentence end of the second commodity name and the position of the sentence end of the second category information, a mark list is generated, an input vector is determined according to the mark list by using an embedded layer, the input vector is encoded through an encoder in a trained first model, fusion feature encoding CLS is obtained, and a correlation prediction result of the second sample is determined according to the fusion feature encoding through a trained classifier of the first model.

Through the method, the trained first model is used, the second sample set obtained from a merchant can be screened, the second sample pair irrelevant to the commodity name and the category information is automatically removed, the automatic screening of the second sample set is realized, and convenience in screening of available samples is improved.

In some embodiments, the computer device trains the second model based on the available samples, resulting in a trained second model, comprising: the computer device determining a third sample set from the available samples; wherein the third sample set includes at least one third sample including text pairs of two different trade names; the computer equipment trains the second model by adopting the third sample set to obtain a trained second model.

The third set of samples includes at least one third sample for training the second model. The third sample includes two trade names, optionally, the two trade names are different. For example, the third sample is expressed in the form of (trade name 1, trade name 2).

Optionally, for any one of the third samples, the two trade names contained in the third sample are selected from at least one available sample. For details of this process, please refer to the following examples.

In some embodiments, the computer device uses the third sample set to supervise the second model, resulting in a trained second model. Alternatively, the training process of the second model belongs to small sample training, so that the amount of required training data is smaller, and the convergence speed is faster.

In some embodiments, the data volume of the first sample set is greater than the data volume of the third sample set.

In some embodiments, the second model includes a first network and a second network, the first network and the second network having at least one same parameter. Optionally, all parameters in the first network and the second network are the same.

And respectively processing the two commodity names in the third sample through the first network and the second network to obtain the output corresponding to the two models. The degree of similarity of the two trade names is determined by calculating the similarity between the two outputs.

If the similarity degree of the two commodity names is higher, the two commodity names are indicated to correspond to the same category information, and if the similarity degree of the two commodity names is lower, the two commodity names are indicated to respectively correspond to different category information. The computer device obtains a trained second model by adjusting model parameters of the second model.

From at least one available sample, it is determined that two commodity names are carried in a third sample, and a small sample training is performed on the second model using the third sample. In one aspect. The method is beneficial to reducing the number of samples required to be used in the second model training process, and simplifies the collection difficulty of the third samples; on the other hand, the method is beneficial to reducing the collection difficulty of training samples and shortening the training time consumption of the second model.

In other embodiments, at least one available sample may be manually selected from the second set of samples and the third set of samples may be generated using the at least one available sample. The accuracy of at least one available sample obtained by manual screening is high.

Next, description will be made of a method of generating the third sample set by several embodiments.

In some embodiments, the computer device determines a third sample set from the available samples, comprising: the computer equipment selects any two commodity names corresponding to the same category information from the available samples to obtain a third sample belonging to the positive sample; the computer equipment selects any two commodity names corresponding to different category information from the available samples to obtain a third sample belonging to the negative sample; the computer device obtains a third sample set based on at least one third sample belonging to the positive sample and at least one third sample belonging to the negative sample.

In one embodiment, the third samples include a third sample that belongs to a positive sample, and a third sample that belongs to a negative sample; wherein, two commodity names included in the third sample belonging to the positive sample have the same category information, and two commodity names included in the third sample belonging to the negative sample have different category information.

For example, the third sample a includes trade name 1 and trade name 2; the commodity name 1 corresponds to the category information 1, the commodity name 2 corresponds to the category information 2, and if the category information 1 and the category information 2 are the same, the third sample a is described as belonging to the positive sample.

For another example, the third sample B includes trade name 3 and trade name 4; and the commodity name 3 corresponds to the category information 3, the commodity name 4 corresponds to the category information 4, and if the category information 3 and the category information 4 are different, the third sample B is indicated to belong to the negative sample.

If the third sample belonging to the positive sample needs to be constructed, the computer equipment selects any two commodity names corresponding to the same category information from the available samples, and obtains the third sample belonging to the positive sample according to the commodity names corresponding to the same category information.

If the third sample belonging to the negative sample is required to be constructed, the computer equipment selects any two commodity names corresponding to different category information from the available samples, and obtains the third sample belonging to the negative sample according to the commodity names corresponding to the two different category information.

In some embodiments, the third sample includes two commodity names and positive and negative sample attributes for characterizing similarity of the two commodity names. Optionally, the representation of the third sample is: (x 1, x2, y), wherein x1 represents one commodity name in the third sample, x2 represents one commodity name in the third sample, y is a positive and negative sample attribute, and y is used to characterize the similarity between the two commodity names.

Alternatively, y=0 or y=1. For example, for the third sample 1 belonging to the positive sample, y=1, indicating that the two trade names in the third sample 1 are correlated. For another example, for the third sample 2 belonging to the negative sample, y=0, indicating that the two trade names in the third sample 2 are uncorrelated.

In some embodiments, after the computer device screens out the plurality of available samples from the second sample set, the computer device stores commodity names in the plurality of available samples according to the category information, that is, a storage space stores commodity names with the same category information, and category information corresponding to the commodity names stored in different storage spaces is not identical. Memory space is understood to be a memory entry in a database, and the application is not limited to the type of memory space.

If the third sample belonging to the positive sample needs to be constructed, the computer equipment randomly selects two commodity names from a certain storage space, and constructs the third sample belonging to the positive sample through the two commodity names.

If the third sample belonging to the negative sample needs to be constructed, the computer device selects 1 commodity name from the two storage spaces (the two storage spaces are different), and constructs the third sample belonging to the negative sample by selecting the commodity names from the two storage spaces.

And storing commodity names in at least one available sample according to category information, thereby helping to accelerate the construction process of a third sample. Under the condition that the number of available samples is large, the construction speed of the positive and negative third sample sets is increased, and the time consumption for obtaining the trained second model is shortened.

By the method, the third sample set for training the second model can be obtained by utilizing at least one commodity name and category information collected in the target application program. On the one hand, since the category setting situation in the target application program is not exactly the same as the category setting situation in the other application programs, the second model cannot be trained directly using the data acquired by the other application programs. On the other hand, the available samples obtained through screening are used for constructing a sample set for training the second model, so that the method can be more suitable for the commodity name naming characteristics in the target application program in the second model training process, and the accuracy of the commodity name to be identified, which is determined by the trained second model, is improved.

The training method of the second model is described below by way of several examples.

In some embodiments, the computer device trains the second model using the third sample set, resulting in a trained second model, comprising: for any of the third samples, the computer device determines a first characteristic representation of one of the trade names contained in the third sample using a first network comprised by the second model; the computer device determining a second feature representation of another commodity name contained in the third sample using a second network comprised by the second model; the computer device calculates a similarity between the first feature representation and the second feature representation; and the computer equipment trains the second model according to the similarity and the positive and negative sample attributes corresponding to the third sample to obtain a trained second model.

Optionally, the second model is a twin network model or a matching network model. In some embodiments, the second model includes a first network and a second network for extracting feature representations, and a loss calculation layer for calculating similarity.

Taking the first network as an example, the first network may be a CNN network, a long short time memory (Long Short Term Memory, LSTM) network, a BERT network, or any network that can be used for feature extraction of text.

In some embodiments, the first network in the second model and the second network in the second model have the same network structure. Optionally, the first network and the second network share parameters. For example, the first network and the second network are both constructed based on BERT.

In some embodiments, the first network and the second network may be the same network, i.e., the same network is used, and the first feature representation and the second feature representation, respectively, corresponding to the two commodity names in the third sample are determined.

In some embodiments, the computer device processes the two commodity names respectively through the embedding layer to obtain an input feature 1 and an input feature 2 corresponding to the two commodity names respectively. The computer device obtains a first feature representation by feature encoding the input feature 1 over a first network. And carrying out feature encoding on the input features 2 through a second network to obtain a second feature representation.

Alternatively, the method of determining the input vector 1 and the input vector 2 is similar to the method of determining the input vector by the embedding layer of the first model above. This process is described with any one of trade names included in the third sample as an example. Firstly, the computer equipment uses a word segmentation device to segment the commodity name to obtain the segmented commodity name, determines the character features, the segment features (optionally, the segment features can be indeterminate) and the position features corresponding to the commodity name through an embedding layer, and adds the text features, the segment features and the position features to obtain the input features corresponding to the commodity name.

The input feature 1 is encoded via a first network to obtain a first feature representation, and the input feature 2 is encoded via a second network to obtain a second feature representation. For details of this process, please refer to the above embodiments, and detailed descriptions thereof are omitted.

Fig. 6 is a schematic diagram of a second model working method according to an exemplary embodiment of the present application.

A third sample includes commodity name 1 and commodity name 2. And processing the commodity name 1 and the commodity name 2 through a second model respectively to obtain a first characteristic representation corresponding to the commodity name 1 and a second characteristic representation corresponding to the commodity name 2, and calculating the similarity between the commodity name 1 and the commodity name 2 according to the first characteristic representation and the second characteristic representation.

In some embodiments, the second model includes a twin network therein.

Fig. 7 is a schematic diagram of a twin network architecture provided by an exemplary embodiment of the present application. Optionally, the first network and the second network share parameters in fig. 7. Further, the first network and the second network may be the same network.

And respectively determining the feature representations corresponding to the two commodity names in the third sample through a first network and a second network in the second model, and determining the similarity between the two commodity names according to the feature representations corresponding to the two commodity names through a similarity calculation layer in the second model.

The computer device then determines a training loss based on the similarity, and the positive and negative properties of the third sample.

In some embodiments, the computer device treats cosine similarity between the first feature representation and the second feature representation as similarity between the first feature representation and the second feature representation.

In some embodiments, the computer device calculates a similarity between the first feature representation and the second feature representation, comprising: calculating a first module length of the first feature representation, and calculating a second module length of the second feature representation, calculating an inner product between the first feature representation and the second feature representation; and determining the similarity between the first characteristic representation and the second characteristic representation according to the first module length, the second module length and the inner product.

Optionally, the similarity between the first and second feature representations is proportional to the inner product, the similarity between the first and second feature representations is inversely proportional to the first modulo length, and the similarity between the first and second feature representations is inversely proportional to the second modulo length.

In some embodiments, the similarity between the first and second feature representations is calculated by the following formula:

wherein f _W (x ₁ ) For the first characteristic representation, f _W (x ₂ ) For the second characteristic representation, E _W (x ₁ ，x ₂ ) Representing the similarity between the first feature representation and the second feature representation, ||f _W (x ₁ ) I represents the first modulo length, f _W (x ₂ ) The term "is used to denote the second mode length,<f _W (x ₁ )，f _W (x ₂ ) Represents the inner product between the first feature representation and the second feature representation.

Calculating the similarity E between two commodity names in the third sample by the formula _W (x ₁ ，x ₂ ) From the characteristics of the cosine function, E _W (x ₁ ，x ₂ ) The range of the value of (C) is [ -1,1]，E _W (x ₁ ，x ₂ ) The closer to 1, the more similar the representation of the first feature is to the representation of the second feature, E _W (x ₁ ，x ₂ ) The closer to-1, the greater the difference between the first and second feature representations is represented.

Alternatively, the computer device may also use other similarity calculation methods to calculate the similarity. Other similarity calculations include, but are not limited to, calculating a Euclidean distance between the first feature representation and the second feature representation.

In some embodiments, the computer device trains the second model according to the similarity and the positive and negative sample attributes corresponding to the third sample, and obtains a trained second model, including: and the computer equipment calculates the training loss of the model according to the similarity and the positive and negative sample attributes corresponding to the third sample, and adjusts the second model according to the training loss until the second model converges to obtain a trained second model.

In some embodiments, the training loss is calculated by the following formula:

wherein,representing a third sample i>And->Respectively representing two trade names included in the third sample i,/->Representing the training loss corresponding to the third sample i, < >>Calculation formula representing the loss function of the third sample i belonging to the positive sample,/>Calculation formula, y, representing the loss function of the third sample i belonging to the negative sample ⁽ⁱ⁾ Representing the positive and negative sample properties of the third sample i, optionally y ⁽ⁱ⁾ =0 or y ⁽ⁱ⁾ ＝1，E _W Representing the similarity between two trade names in the third sample i, m represents a similarity threshold, optionally m is preset.

Fig. 8 is a schematic diagram of a loss function provided by an exemplary embodiment of the present application.

By the method, the small sample training of the second model is realized, the scale of training samples needed in the training process is reduced, and the time consumption for training the second model is shortened. Compared with the manual uploading of category information corresponding to commodity names by merchants or brands, the method reduces labor cost, and is also beneficial to improving the correlation between the category information and the commodity names, so that the accuracy of recalled commodities searched according to search words provided by users is improved.

The trained second model can be used for determining category information corresponding to the commodity name to be identified. The computer device obtains a trained second model in this way, which is able to compare the similarity between the comparable commodity name and the commodity model to be processed via the first network and the second network, respectively.

The accuracy of the predicted category information is higher according to the commodity name to be identified by the trained second model. Table 3 shows the corresponding effects of the category information predicted by the second model after training and the commodity name to be identified.

TABLE 3 Table 3

Assuming that there are a plurality of comparable commodity names and known category information corresponding to the plurality of comparable commodity names respectively (alternatively, the plurality of known category information can cover the types of all existing commodities in the target application program), the computer device uses the trained second model to respectively determine the similarity between the commodity names to be identified and each of the comparable commodity names, determine the target commodity name with the highest similarity to the commodity names to be identified, and determine the known category information corresponding to the target commodity name as the category information of the commodity to be identified.

As merchants or brands continue to develop new products, new types of products may be developed such that the similarity between the product name to be identified and any comparable product name is low, in which case training of the trained second model is required.

For example, the computer device obtains the commodity name 1 to be identified, and the computer device uses the trained second model to determine the similarity between the commodity name 1 to be identified and the names of the comparable categories.

If each similarity is lower than the similarity threshold k, k is a positive number less than or equal to 1. Optionally, if k is preset, a fourth sample set can be constructed by manually determining new category information corresponding to the commodity name 1 to be identified and based on the new category information, wherein the fourth sample set comprises at least one fourth sample, and the fourth sample comprises two commodity names; wherein, the two commodity names comprise at least one newly added type commodity name; the commodity name of the new type refers to the commodity name corresponding to the commodity belonging to the new category information.

In some embodiments, the fourth samples include a fourth sample that belongs to a positive sample and a fourth sample that belongs to a negative sample; the fourth sample belonging to the positive sample comprises two commodity names of the new type, and the fourth sample belonging to the negative sample comprises one commodity name belonging to the new type and one commodity name corresponding to the known type.

The computer equipment trains the trained second training model by using the fourth sample set to obtain an updated second training model, and the updated second training model is used for replacing the trained second training model to determine category information corresponding to the commodity name to be identified.

By the method, the second model can be continuously adapted to the change of service data (such as the commodity name to be identified), and the correlation between the category information determined by the second model and the commodity name to be identified can be maintained.

The construction process of the first sample set is described below by way of several embodiments.

In some embodiments, a computer device obtains a first set of samples, comprising: capturing a plurality of reference samples by the computer equipment; wherein, the reference sample comprises: reference commodity name and reference category information having correlation; the computer equipment determines a first commodity name and first category information from the same reference sample to obtain a first sample belonging to the positive sample; the computer equipment respectively determines a first commodity name and first category information from different reference samples to obtain a first sample belonging to a negative sample; the computer device obtains a first set of samples from at least one first sample belonging to a positive sample and at least one first sample belonging to a negative sample.

In some embodiments, the computer device requests the reference sample from at least one other application by way of a crawler or the like, resulting in a plurality of reference samples. Optionally, the reference sample includes a correlation between the reference commodity name and the reference category information.

In some embodiments, the computer device determines the first sample that belongs to the positive sample based on the reference commodity name and the reference category information in any one of the reference samples.

In some embodiments, the first sample is identified in the form of (first merchandise name, first category information, y) or (first category information, first merchandise name, y) or the like; wherein y is used to characterize the positive and negative properties of the first sample. Y=1 in the first samples belonging to the positive samples, and y=0 in the first samples belonging to the positive samples.

The computer device respectively determines a first commodity name and first category information from different reference samples of a plurality of reference samples to obtain a first sample belonging to a negative sample, and the computer device comprises: the computer equipment obtains a plurality of reference commodity names and a plurality of reference category information according to the plurality of reference samples; the computer device performs random matching in the plurality of reference names and the plurality of reference category information to obtain a first sample belonging to the negative sample.

For example, the computer device acquires 3 reference samples, namely a reference sample a (reference commodity name a, reference category information a), a reference sample b (reference commodity name b, reference category information b) and a reference sample c (reference commodity name c, reference category information c), and obtains 3 reference commodity names, namely the reference commodity name a, the reference commodity name b and the reference commodity name c, respectively, according to the 3 reference samples; the computer equipment obtains 3 pieces of reference category information, namely reference category information a, reference category information b and reference category information c, according to the 3 pieces of reference samples.

Alternatively, the random matching of the computer device in the plurality of reference names and the plurality of category information may be achieved by a random number seed. For example, the computer device numbers the plurality of reference commodity names and the plurality of category information, generates two random numbers through the random number seed, takes the reference commodity name numbered as the random number 1 as the first commodity name a, determines the reference category information numbered as the random number 2 as the first category information a, and obtains a first sample A belonging to the negative sample according to the first commodity name a and the first category information a. The first sample a in the negative sample may be expressed as (first trade name a, first category information a, 0).

By this means it is possible to create at least one first sample belonging to a negative sample in a normal distribution. The computer device composes a first set of samples from at least one first sample belonging to the group of positive samples and at least one first sample belonging to the group of negative samples.

The training process of the first model is described below by way of several embodiments.

In some embodiments, the computer device trains the first model with the first sample set, resulting in a trained first model, comprising: for any one first sample, the computer equipment splices the first commodity name and the first category information contained in the first sample to obtain first splicing information; the computer equipment determines the input characteristics of the first sample according to the first splicing information; the input features comprise character features of the first splicing information, and are used for representing position features of position information of each independent word in the first splicing information and segment features of segments to which each independent word in the first splicing information belongs; the computer equipment determines fusion coding characteristics of the first splicing information according to the input characteristics through an encoder of the first model; the fusion coding feature is used for representing the correlation relation between the first commodity name and the first category information; the computer equipment determines a prediction result of the first sample based on the fusion coding feature through a classifier of the first model; the prediction result of the first sample is used for representing the correlation between the predicted first commodity name and the first category information; the computer equipment trains the first model according to the prediction result of the first sample and the positive and negative sample attributes corresponding to the first sample, and the trained first model is obtained.

In some embodiments, the first splice information includes: the first commodity name and the first category information. For example, the first splicing information is the first commodity name+the first category information, or the first splicing information is the first category information+the first commodity name.

Optionally, the computer device may use a word segmentation device to segment the first commodity name and the first category information, and form the segmented first commodity name and the segmented first category information into the first splicing information.

Or after the computer equipment obtains the first splicing information, the first splicing information is segmented. The computer equipment generates a mark list according to the first splicing information, determines input features corresponding to the first sample according to the mark list through an embedding layer of the first model, processes the input features through an encoder in the first model to obtain fusion coding features, and processes the fusion coding features through a classifier in the second model to obtain a correlation prediction result aiming at the second sample.

The word segmenter is used to divide text into separate words. The independent words refer to the smallest units in the text that cannot be further divided. The computer equipment divides the second commodity name and the second category information by using a word segmentation device to obtain a segmented first commodity name and segmented first category information.

The first trade name after word segmentation and the first category information after word segmentation each comprise at least one independent word. Independent words refer to words having independent meanings independent of context.

The embedding layer of the first model comprises: a mark embedding layer, a segment embedding layer and a position embedding layer. The text feature comprises a text feature vector corresponding to any independent word in the first sample.

Optionally, the tag list includes a spacer "[ SEP ]" and a semantic insert "[ CLS ]". The computer equipment sets a spacer [ SEP ] at the end of the sentence of the first commodity name after word segmentation, sets a spacer [ SEP ] at the end of the sentence of the first category information after word segmentation, sets semantic embedding [ CLS ] at the head of the mark list and is used for representing semantic information of the second sample, and a mark list is obtained.

The segment embedding layer is used for generating corresponding segment features for any independent word in the tag list. Wherein the segment characteristics are used for whether the independent word belongs to the first commodity name or the first category information.

Input feature=text feature+fragment feature+position feature.

The computer equipment transmits the input characteristics into an encoder of the trained first model, and the fusion coding characteristics corresponding to the first sample are determined through the encoder. For the fusion coding feature generation process, please refer to the above embodiments, and details are not described herein.

In some embodiments, the loss function used in the first model training process is the same as the loss function used in the second model training process, and detailed descriptions thereof are omitted herein.

Although the first sample is from other application programs, the naming mode of the first commodity name and the first category information and the naming mode of the commodity name and the category information in the target application program are not identical, the first model judges the correlation between the commodity name and the category information according to the semantics, and therefore the trained first model can screen out available samples from the second sample set more accurately.

In some embodiments, the first model includes an embedded layer, an encoder, and a classifier. The second model includes a first network, a second network, and a similarity prediction layer. Alternatively, the first network and the second network share parameters, and even the first network and the second network may be the same network for extracting the fused feature vector of the text.

Alternatively, the embedded layer of the first model after training and parameters in the encoder may be used to construct the first network of the second model and the second network of the second model. For example, the embedded layer and the encoder of the first model after training are used as the first network (or the second network) of the second model, and the method is helpful to accelerate the convergence speed in the training process of the second model.

The following are examples of the apparatus of the present application that may be used to perform the method embodiments of the present application. For details not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the method of the present application.

Referring to FIG. 9, a block diagram of a training apparatus for a category identification model is shown, provided in one embodiment of the present application. The device has the function of realizing the method example, and the function can be realized by hardware or can be realized by executing corresponding software by hardware. The apparatus may be the computer device described above or may be provided in a computer device. As shown in fig. 9, the apparatus 900 may include: a sample acquisition module 910, a first training module 920, a sample screening module 930, and a second training module 940.

The sample acquisition module 910 is configured to acquire a first sample set, where the first sample set includes at least one first sample, and the first sample includes a first commodity name and a text pair of first category information.

The first training module 920 is configured to train the first model with the first sample set to obtain a trained first model; wherein the first model is used for determining the correlation between commodity names and category information.

The sample screening module 930 is configured to screen the second sample set using the trained first model to determine at least one available sample; wherein the second sample set includes at least one second sample including a text pair of a second commodity name and second category information.

The second training module 940 is configured to train the second model based on the available samples, to obtain a trained second model; the second model is used for determining category information corresponding to the commodity name to be identified.

In some embodiments, the sample screening module 930 is configured to determine, for any of the second samples, a correlation prediction result of the second sample using the trained first model; wherein the correlation prediction result is used for representing correlation between a second commodity name and second category information contained in the second sample; and determining at least one second sample of which the correlation prediction result meets the screening condition as the at least one available sample.

In some embodiments, the second training module 940 includes: a sample set construction unit for determining a third sample set from the available samples; wherein the third sample set comprises at least one third sample comprising text pairs of two different trade names; and the model training unit is used for training the second model by adopting the third sample set to obtain the trained second model.

In some embodiments, the sample set construction unit is configured to pick out any two trade names corresponding to the same category information from the available samples, and obtain a third sample belonging to the positive sample; selecting any two commodity names corresponding to different category information from the available samples to obtain a third sample belonging to the negative sample; and obtaining the third sample set according to at least one third sample belonging to the positive sample and at least one third sample belonging to the negative sample.

In some embodiments, the model training unit is configured to determine, for any one of the third samples, a first feature representation of one commodity name contained in the third sample using a first network included in the second model; determining a second feature representation of another commodity name contained in a third sample using a second network comprised by the second model; calculating a similarity between the first feature representation and the second feature representation; and training the second model according to the similarity and the positive and negative sample attributes corresponding to the third sample to obtain the trained second model.

In some embodiments, the first training module 920 is configured to splice, for any one of the first samples, the first commodity name and the first category information included in the first sample to obtain first splice information; determining input characteristics of the first sample according to the first splicing information; the input features comprise character features of first splicing information, and are used for representing position features of position information of each independent word in the first splicing information and segment features of segments to which each independent word belongs in the first splicing information; determining fusion coding characteristics of first splicing information according to the input characteristics through an encoder of the first model; wherein the fusion encoding feature is used to characterize a correlation relationship between the first commodity name and the first category information; determining a prediction result of the first sample based on the fusion coding feature by a classifier of the first model; the prediction result of the first sample is used for representing the correlation between the predicted first commodity name and the first category information; and training the first model according to the prediction result of the first sample and the positive and negative sample attributes corresponding to the first sample to obtain the trained first model.

In some embodiments, the sample acquisition module 910 is configured to grasp a plurality of reference samples; wherein the reference sample comprises: reference commodity name and reference category information having correlation; determining the first commodity name and the first category information from the same reference sample to obtain a first sample belonging to a positive sample; respectively determining the first commodity name and the first category information from different reference samples to obtain a first sample belonging to a negative sample; the first sample set is obtained from at least one of the first samples belonging to positive samples and at least one of the first samples belonging to negative samples.

It should be noted that, in the apparatus provided in the foregoing embodiment, when implementing the functions thereof, only the division of the foregoing functional modules is used as an example, in practical application, the foregoing functional allocation may be implemented by different functional modules, that is, the internal structure of the device is divided into different functional modules, so as to implement all or part of the functions described above. In addition, the apparatus and the method embodiments provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the apparatus and the method embodiments are detailed in the method embodiments and are not repeated herein.

Referring to FIG. 10, a block diagram of a computer device 1000 according to one embodiment of the application is shown.

In general, the computer device 1000 includes: a processor 1001 and a memory 1002.

The processor 1001 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 1001 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 1001 may also include a main processor, which is a processor for processing data in an awake state, also referred to as a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 1001 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 1001 may further include an AI processor for processing computing operations related to machine learning.

Memory 1002 may include one or more computer-readable storage media, which may be non-transitory. Memory 1002 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1002 is used to store a computer program configured to be executed by one or more processors to implement the training method of the category identification model described above.

Those skilled in the art will appreciate that the architecture shown in fig. 10 is not limiting as to the computer device 1000, and may include more or fewer components than shown, or may combine certain components, or employ a different arrangement of components.

In an exemplary embodiment, a computer readable storage medium is also provided, in which a computer program is stored which, when being executed by a processor, implements the training method of the above category identification model.

Alternatively, the computer-readable storage medium may include: ROM (Read-Only Memory), RAM (Random Access Memory ), SSD (Solid State Drives, solid state disk), or optical disk, etc. The random access memory may include, among other things, reRAM (Resistance Random Access Memory, resistive random access memory) and DRAM (Dynamic Random Access Memory ).

In an exemplary embodiment, a computer program product is also provided, the computer program product comprising a computer program stored in a computer readable storage medium. The processor of the computer device reads the computer program from the computer readable storage medium, and the processor executes the computer program so that the terminal device executes the training method of the category identification model.

It should be understood that references herein to "a plurality" are to two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. In addition, the step numbers described herein are merely exemplary of one possible execution sequence among steps, and in some other embodiments, the steps may be executed out of the order of numbers, such as two differently numbered steps being executed simultaneously, or two differently numbered steps being executed in an order opposite to that shown, which is not limiting.

It should be noted that, the information, data and signals related to the present application are all authorized by the user or fully authorized by the parties, and the collection, use and processing of the related data are required to comply with the related laws and regulations and standards of the related countries and regions. For example, palm print information, an image to be processed, and the like, which are referred to in the present application, are acquired with sufficient authorization.

The foregoing description of the exemplary embodiments of the application is not intended to limit the application to the particular embodiments disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the application.

Claims

1. A method for training a category recognition model, the method comprising:

2. The method of claim 1, wherein the screening the second sample set using the trained first model to determine at least one available sample comprises:

for any one of the second samples, determining a correlation prediction result of the second sample by using the trained first model; wherein the correlation prediction result is used for representing correlation between a second commodity name and second category information contained in the second sample;

and determining at least one second sample of which the correlation prediction result meets the screening condition as the at least one available sample.

3. The method of claim 1, wherein training the second model based on the available samples results in a trained second model comprising:

determining a third sample set according to the available samples; wherein the third sample set comprises at least one third sample comprising text pairs of two different trade names;

And training the second model by adopting the third sample set to obtain the trained second model.

4. A method according to claim 3, wherein said determining a third sample set from said available samples comprises:

selecting any two commodity names corresponding to the same category information from the available samples to obtain a third sample belonging to the positive sample;

selecting any two commodity names corresponding to different category information from the available samples to obtain a third sample belonging to the negative sample;

and obtaining the third sample set according to at least one third sample belonging to the positive sample and at least one third sample belonging to the negative sample.

5. A method according to claim 3, wherein training the second model using the third sample set results in the trained second model, comprising:

for any of the third samples, determining a first feature representation of one trade name contained in the third sample using a first network comprised by the second model;

determining a second feature representation of another commodity name contained in a third sample using a second network comprised by the second model;

Calculating a similarity between the first feature representation and the second feature representation;

and training the second model according to the similarity and the positive and negative sample attributes corresponding to the third sample to obtain the trained second model.

6. The method of claim 1, wherein training the first model using the first sample set to obtain a trained first model comprises:

for any one of the first samples, splicing the first commodity name and the first category information contained in the first sample to obtain first splicing information;

determining input characteristics of the first sample according to the first splicing information; the input features comprise character features of first splicing information, and are used for representing position features of position information of each independent word in the first splicing information and segment features of segments to which each independent word belongs in the first splicing information;

determining fusion coding characteristics of first splicing information according to the input characteristics through an encoder of the first model; wherein the fusion encoding feature is used to characterize a correlation relationship between the first commodity name and the first category information;

Determining a prediction result of the first sample based on the fusion coding feature by a classifier of the first model; the prediction result of the first sample is used for representing the correlation between the predicted first commodity name and the first category information;

and training the first model according to the prediction result of the first sample and the positive and negative sample attributes corresponding to the first sample to obtain the trained first model.

7. The method of claim 1, wherein the obtaining the first set of samples comprises:

grabbing a plurality of reference samples; wherein the reference sample comprises: reference commodity name and reference category information having correlation;

determining the first commodity name and the first category information from the same reference sample to obtain a first sample belonging to a positive sample;

respectively determining the first commodity name and the first category information from different reference samples to obtain a first sample belonging to a negative sample;

the first sample set is obtained from at least one of the first samples belonging to positive samples and at least one of the first samples belonging to negative samples.

8. A training device for a category recognition model, the device comprising:

9. A computer device comprising a processor and a memory, the memory having stored therein a computer program that is loaded and executed by the processor to implement the method of any of claims 1 to 7.

10. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program, which is loaded and executed by a processor to implement the method of any of the preceding claims 1 to 7.