CN117743516A

CN117743516A - Language model training method and device, equipment and medium thereof

Info

Publication number: CN117743516A
Application number: CN202311789826.6A
Authority: CN
Inventors: 吴培浩
Original assignee: Guangzhou Shangyan Network Technology Co ltd
Current assignee: Guangzhou Shangyan Network Technology Co ltd
Priority date: 2023-12-22
Filing date: 2023-12-22
Publication date: 2024-03-22

Abstract

The application relates to a language model training method, a device, equipment and a medium thereof in the technical field of electronic commerce, wherein the method comprises the following steps: and a single training sample and a supervision tag thereof in a training set are called, a predicted text is deduced by a language model according to the training sample, a literal consistency loss value between the predicted text and the supervision tag is determined, the supervision tag is an original text, the training sample is a covering text obtained by replacing part of text in the original text, predicted semantic features of the predicted text and actual semantic features of the supervision tag are represented by the language model, a semantic correlation loss value between the two features is determined, weight parameters of the language model are updated according to the two loss values, and other training samples and supervision tags thereof in the training set are called for iterative training until the two loss values meet preset conditions, so that the language model trained to be converged is obtained. The method ensures that the language model can accurately understand the semantics of the text.

Description

Language model training method and device, equipment and medium thereof

Technical Field

The present disclosure relates to the field of electronic commerce technologies, and in particular, to a language model training method, and a corresponding apparatus, computer device, and computer readable storage medium thereof.

Background

In the search field of the independent station E-commerce, commodity information of all online shops built on the independent station is stored in the index of the same elastic search, and the search performance of each fragment is maintained by setting a reasonable fragment number. However, as the data volume increases, the index is larger and the number of fragments is larger, and the size of each fragment is larger, because of the distributed search characteristic of the elastomer search, the whole search on commodity information of stores on the same line can search all fragments in the index, and as the whole data volume increases, the search performance can be greatly influenced and the search time is long.

In the conventional technology, the number of the index fragments is generally increased to reduce the size of a single fragment, so that the searching performance of the single fragment is ensured, but when the whole data volume is too large, the searching performance is still not affected little, the searching time is long, and the experience of a user using a searching service is affected.

In view of the defects of the traditional technology, the applicant has long been engaged in research in the related field, and is in order to solve the problem in the technical field of electronic commerce, so a new way is developed.

Disclosure of Invention

It is a primary object of the present application to solve at least one of the above problems and provide a language model training method and corresponding apparatus, computer device, computer readable storage medium.

In order to meet the purposes of the application, the application adopts the following technical scheme:

a language model training method provided in accordance with one of the objects of the present application, comprising the steps of:

acquiring a training set, wherein the training set comprises a plurality of training samples and supervision labels thereof, the supervision labels are original texts, and the training samples are covering texts obtained by replacing part of texts in the original texts;

calling a single training sample and a supervision tag thereof in a training set, reasoning out a predicted text according to the training sample by a language model, and determining a literal consistency loss value between the predicted text and the supervision tag;

the language model shows the predicted semantic features of the predicted text and the actual semantic features of the supervision labels, and semantic correlation loss values between the predicted semantic features and the actual semantic features are determined;

and updating the weight parameters of the language model according to the literal consistency loss value and the semantic relevance loss value, calling other training samples in the training set and supervision labels thereof to carry out iterative training until the literal consistency loss value and the semantic relevance loss value meet the preset conditions, and confirming that the language model is trained to a convergence state.

In a further embodiment, before acquiring the training set, the method comprises the following steps:

acquiring a plurality of electric business texts, wherein the electric business comprises any one or more of similar commodity matching, commodity comments, customer service and commodity shopping guide;

data cleaning is carried out on the plurality of business texts, and a data set is constructed by the business texts after data cleaning;

each business text in the data set is used as an original text, and partial texts with preset proportions in the original text are replaced by pre-specified probability texts, so that a covering text is obtained;

the training set is constructed with all the original text and its cover text.

In a further embodiment, the method for reasoning out the predicted text from the training sample by the language model comprises the following steps:

predicting dictionary probability distribution corresponding to all covered positions in the training sample by using the language model;

determining an output text according to dictionary probability distribution corresponding to each covered position by adopting a preset sampling strategy;

and replacing the current text of the covered position of the output text in the training sample with each output text to obtain a predicted text.

In a further embodiment, after the validation language model is trained to a converged state, the method comprises the steps of:

Acquiring a first fine tuning training set, wherein the first fine tuning training set comprises a plurality of question training samples and answer supervision labels thereof, the question training samples are question texts and related knowledge documents, and the answer supervision labels are answer texts of the question texts in the knowledge documents;

invoking a single question training sample and an answer supervision label thereof in the first fine tuning training set, reasoning out a predicted answer text according to the question training sample by a language model, and determining a position consistency loss value between the predicted answer text and the answer supervision label;

and updating the weight parameters of the language model according to the position consistency loss value, calling other question training samples in the first fine tuning training set and answer supervision labels thereof to carry out iterative training until the position consistency loss value meets the preset condition, and confirming that the language model is fine-tuned to be in a convergence state to serve as a question-answer language model.

In a further embodiment, after the confirmation language model is fine-tuned to a convergence state as a question-answer language model, the method comprises the following steps:

responding to the answer display event, and acquiring a question text and a related knowledge document thereof;

determining answer text of the question text in the knowledge document by adopting the question-answer language model;

And controlling the current visual area for displaying the knowledge document to scroll to display answer text.

acquiring a second fine tuning training set, wherein the second fine tuning training set comprises a plurality of voice training samples and key supervision labels thereof, the voice training samples are voice texts, and the key supervision labels are keywords in the voice texts;

invoking a single voice training sample and a key supervision label thereof in the second fine tuning training set, and reasoning out a predicted keyword according to the voice training sample by a language model to determine a key probability loss value between the predicted keyword and the key supervision label;

and updating the weight parameters of the language model according to the key probability loss value, calling other voice training samples in the second fine tuning training set and key supervision labels thereof to carry out iterative training until the key probability loss value meets the preset condition, and confirming that the language model is fine-tuned to be in a convergence state to be used as a keyword extraction model.

In a further embodiment, the verification language model is fine-tuned to a convergence state, and after being used as a keyword extraction model, the method comprises the following steps:

Responding to a voice searching event, and acquiring a voice text obtained by recognizing the voice of a user by adopting a preset voice model;

determining keywords in the voice text by adopting the keyword extraction model;

and calling a commodity searching interface, and transmitting the keywords to the commodity searching interface so as to drive the commodity searching interface to search out commodity results according to the keywords.

On the other hand, the language model training device provided by the purpose of the application comprises a training set acquisition module, a first loss value module, a second loss value module and an iterative training module, wherein the training set acquisition module is used for acquiring a training set, the training set comprises a plurality of training samples and supervision labels thereof, the supervision labels are original texts, and the training samples are covering texts obtained by replacing part of texts in the original texts; the first loss value module is used for calling a single training sample in the training set and a supervision label thereof, reasoning out a predicted text according to the training sample by the language model, and determining a literal consistency loss value between the predicted text and the supervision label; the second loss value module is used for expressing the predicted semantic features of the predicted text and the actual semantic features of the supervision labels by the language model and determining semantic correlation loss values between the predicted semantic features and the actual semantic features; and the iterative training module is used for updating the weight parameters of the language model according to the literal consistency loss value and the semantic relevance loss value, calling other training samples in the training set and the supervision labels thereof to carry out iterative training until the literal consistency loss value and the semantic relevance loss value meet the preset condition, and confirming that the language model is trained to a convergence state.

In a further embodiment, before the training set obtaining module, the method includes: the text acquisition sub-module is used for acquiring a plurality of electric business texts, wherein the electric business comprises any one or more of similar commodity matching, commodity comments, customer service and commodity shopping guide; the data set construction sub-module is used for carrying out data cleaning on the plurality of business texts and constructing a data set by using the business texts after data cleaning; the text construction sub-module is used for respectively taking each business text in the data set as an original text, and replacing part of texts with preset proportions in the original text with a pre-designated probability text to obtain a covering text; the training set construction sub-module is used for constructing a training set by using all original texts and all covering texts thereof.

In a further embodiment, the first loss value module includes: the distribution prediction sub-module is used for predicting dictionary probability distribution corresponding to all covered positions in the training sample by the language model; the output text determining submodule is used for determining an output text according to dictionary probability distribution corresponding to each covered position by adopting a preset sampling strategy; and the predictive text determination submodule is used for replacing the current text of the covered position of the output text in the training sample with each output text to obtain the predictive text.

In a further embodiment, after the iterative training module, the iterative training module includes: the first training set acquisition sub-module is used for acquiring a first fine tuning training set, wherein the first fine tuning training set comprises a plurality of question training samples and answer supervision labels thereof, the question training samples are question texts and related knowledge documents, and the answer supervision labels are answer texts of the question texts in the knowledge documents; the first training set invoking sub-module is used for invoking a single question training sample and an answer supervision label thereof in the first fine tuning training set, reasoning out a predicted answer text according to the question training sample by the language model, and determining a position consistency loss value between the predicted answer text and the answer supervision label; and the first iterative training sub-module is used for updating the weight parameters of the language model according to the position consistency loss value, calling other question training samples in the first fine tuning training set and answer supervision labels thereof to carry out iterative training until the position consistency loss value meets the preset condition, and confirming that the language model is fine-tuned to be in a convergence state to be used as a question-answering language model.

In a further embodiment, after the first iterative training sub-module, the method includes: the first event response sub-module is used for responding to the answer display event and acquiring a question text and a related knowledge document thereof; the first model reasoning sub-module is used for determining answer text of the question text in the knowledge document by adopting the question-answer language model; and the region scrolling sub-module is used for controlling the current visible region displaying the knowledge document to scroll to display answer text.

In a further embodiment, after the iterative training module, the iterative training module includes: the second training set acquisition sub-module is used for acquiring a second fine tuning training set, wherein the second fine tuning training set comprises a plurality of voice training samples and key supervision labels thereof, the voice training samples are voice texts, and the key supervision labels are keywords in the voice texts; the second training set calling sub-module is used for calling a single voice training sample and a key supervision label thereof in the second fine tuning training set, reasoning out predicted keywords according to the voice training sample by the language model, and determining a key probability loss value between the predicted keywords and the key supervision label; and the second iterative training sub-module is used for updating the weight parameters of the language model according to the key probability loss value, calling other voice training samples in the second fine tuning training set and the key supervision labels thereof to carry out iterative training until the key probability loss value meets the preset condition, and confirming that the language model is fine-tuned to be trained to a convergence state as a keyword extraction model.

In a further embodiment, after the second iterative training sub-module, the method further comprises: the second event response sub-module is used for responding to the voice search event and acquiring a voice text obtained by recognizing the voice of the user by adopting a preset voice model; the second model reasoning sub-module is used for determining keywords in the voice text by adopting the keyword extraction model; and the commodity searching sub-module is used for calling a commodity searching interface and transmitting the keywords to the commodity searching interface so as to drive the commodity searching sub-module to search commodity results according to the keywords.

In yet another aspect, a computer device is provided that is adapted for one of the purposes of the present application, including a central processor and a memory, the central processor being configured to invoke the steps of running a computer program stored in the memory to perform the language model training method described herein.

In yet another aspect, a computer readable storage medium adapted for another object of the present application is provided, in the form of computer readable instructions, storing a computer program implemented according to the language model training method, which when invoked by a computer, performs the steps comprised by the method.

The technical solution of the present application has various advantages, including but not limited to the following aspects:

according to the method, a single training sample in a training set and a supervision tag thereof are called, a language model is used for reasoning out a predicted text according to the training sample, a literal consistency loss value between the predicted text and the supervision tag is determined, the supervision tag is an original text, the training sample is a covering text obtained by replacing part of text in the original text, the predicted semantic features of the predicted text and the actual semantic features of the supervision tag are represented by the language model, a semantic correlation loss value between the two features is determined, weight parameters of the language model are updated according to the two loss values, and other training samples in the training set and the supervision tag thereof are called for iterative training until the two loss values meet preset conditions, so that the language model which is trained to be converged is obtained. In one aspect, training a language model using masking text may make the model more focused on the replaced parts, and thus more focused on the understanding of specific contexts and context semantics during learning. On the other hand, modeling language models understand semantically consistent semantics at a relatively microscopic level and semantically relevant semantics at a relatively macroscopic level, thereby ensuring that the models are able to understand the semantics of text accurately.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

FIG. 1 is a network architecture of an exemplary e-commerce platform of the present application;

FIG. 2 is a flow chart of an exemplary embodiment of a language model training method of the present application;

FIG. 3 is a schematic flow chart of pre-constructing training sets in an embodiment of the present application;

FIG. 4 is a schematic flow chart of reasoning out predicted text according to training samples by using a language model in the embodiment of the present application;

FIG. 5 is a flowchart of invoking the training language model of the first fine-tuning training set to a convergence state as a question-answering language model in the embodiment of the present application;

FIG. 6 is a flow chart of an event display in response to answers in an embodiment of the present application;

FIG. 7 is a flowchart of invoking a second fine-tuning training set training language model to a converged state as a keyword extraction model in an embodiment of the present application;

FIG. 8 is a flow chart of responding to a voice search event in an embodiment of the present application;

FIG. 9 is a schematic block diagram of a language model training apparatus of the present application;

fig. 10 is a schematic structural diagram of a computer device used in the present application.

Detailed Description

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for the purpose of illustrating the present application and are not to be construed as limiting the present application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein includes all or any element and all combination of one or more of the associated listed items.

It will be understood by those skilled in the art that all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs unless defined otherwise. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

In the network architecture shown in fig. 1, the e-commerce platform 82 is deployed in the internet to provide corresponding services to its users, and the merchant user's device 80 and the consumer user's device 81 of the e-commerce platform 82 are similarly connected to the internet to use the services provided by the e-commerce platform.

The exemplary e-commerce platform 82 provides matching of supply and demand for products and/or services to the public by means of an internet infrastructure, in the e-commerce platform 82, the products and/or services are provided as merchandise information, and for simplicity of description, the concepts of merchandise, products, etc. are used in this application to refer to the products and/or services in the e-commerce platform 82, specifically, physical products, digital products, tickets, service subscriptions, other off-line fulfillment services, etc.

In reality, each entity of the parties can access the identity of the user to the e-commerce platform 82, and the purpose of participating in the business activity realized by the e-commerce platform 82 is realized by using various online services provided by the e-commerce platform 82. These entities may be natural persons, legal persons, social organizations, etc. The e-commerce platform 82 corresponds to both merchant and consumer entities in commerce, and there are two broad categories of merchant users and consumer users, respectively. The online service can be used in the e-commerce platform 82 by the identity of the merchant user, while the online service can be used in the e-commerce platform 82 by the identity of the consumer, including the real or potential consumer, of the merchant user. In actual business activities, the same entity can perform activities on the identity of a merchant user and the identity of a consumer user, so that the user can flexibly understand the activities.

The infrastructure for deploying the e-commerce platform 82 mainly comprises a background architecture and front-end equipment, wherein the background architecture runs various online services through a service cluster, and the service functions of the background architecture are enriched and perfected by middleware or front-end services facing a platform side, services facing a consumer, services facing a merchant and the like; the head-end equipment primarily encompasses the terminal equipment that the user uses to access the e-commerce platform 82 as a client, including but not limited to various mobile terminals, personal computers, point-of-sale devices, and the like. For example, a merchant user may enter merchandise information for his online store through his terminal device 80, or generate his merchandise information using an open interface of the e-commerce platform; the consumer user can access the web page of the online store realized by the electronic commerce platform 82 through the terminal device 81 thereof, and trigger the shopping flow through the shopping keys provided on the web page, and call various online services provided by the electronic commerce platform 82 in the shopping flow, thereby realizing the purpose of purchasing orders.

In some embodiments, the e-commerce platform 82 may be implemented by a processing facility including a processor and memory that stores a set of instructions that, when executed, cause the e-commerce platform 82 to perform the e-commerce and support functions referred to herein. The processing facility may be part of a server, client, network infrastructure, mobile computing platform, cloud computing platform, fixed computing platform, or other computing platform, and provide electronic components of the merchant platform 82, merchant devices, payment gateways, application developers, marketing channels, transport providers, client devices, point-of-sale devices, and the like.

The e-commerce platform 82 may be implemented as online services such as cloud computing services, software as a service (SaaS), infrastructure as a service (IaaS), platform as a service (PaaS), desktop as a service (DaaS), hosted software as a service, mobile back end as a service (MBaaS), information technology management as a service (ITMaaS), and the like. In some embodiments, the various features of the e-commerce platform 82 may be implemented to be adapted to operate on a variety of platforms and operating systems, e.g., for an online store, the administrator user may enjoy the same or similar functionality, whether in the various embodiments iOS, android, homonyOS, web page, etc.

The e-commerce platform 82 may implement its respective independent station for each merchant to run its respective online store, providing the merchant with a respective instance of the commerce management engine for the merchant to establish, maintain, and run one or more of its online stores in one or more independent stations. The business management engine instance can be used for content management, task automation and data management of one or more online stores, and various specific business processes of the online stores can be configured through interfaces or built-in components and the like to support the realization of business activities. The independent station is an infrastructure of the e-commerce platform 82 with cross-border service functionality, and merchants can maintain their online stores more centrally and autonomously based on the independent station. The stand-alone stations typically have merchant-specific domain names and memory space, with relative independence between the different stand-alone stations, and the e-commerce platform 82 may provide standardized or personalized technical support for a vast array of stand-alone stations, so that merchant users may customize their own adaptive commerce management engine instances and use such commerce management engine instances to maintain one or more online stores owned by them.

The online store may implement background configuration and maintenance by the merchant user logging in his business management engine instance with an administrator identity, which, in support of various online services provided by the infrastructure of the e-commerce platform 82, may configure various functions in his online store, review various data, etc., e.g., the merchant user may manage various aspects of his online store, such as viewing recent activities of the online store, updating online store inventory, managing orders, recent access activities, total order activities, etc.; the merchant user may also view more detailed information about the business and visitors to the merchant's online store by retrieving reports or metrics, such as sales summaries showing the merchant's overall business, specific sales and participation data for the active sales marketing channel, etc.

The e-commerce platform 82 may provide a communications facility and associated merchant interface for providing electronic communications and marketing, such as utilizing an electronic message aggregation facility to collect and analyze communications interactions between merchants, consumers, merchant devices, customer devices, point-of-sale devices, etc., to aggregate and analyze communications, such as for increasing the potential to provide product sales, etc. For example, a consumer may have problems with the product, which may create a dialogue between the consumer and the merchant (or an automated processor-based proxy on behalf of the merchant), where the communication facility is responsible for interacting and providing the merchant with an analysis of how to increase sales probabilities.

In some embodiments, an application program suitable for being installed to a terminal device may be provided to serve access requirements of different users, so that various users can access the e-commerce platform 82 in the terminal device through running the application program, for example, a merchant background module of an online store in the e-commerce platform 82, and in the process of implementing the business activity through the functions, the e-commerce platform 82 may implement various functions related to supporting implementation of the business activity as middleware or online service and open corresponding interfaces, and then implant a tool kit corresponding to the interface access function into the application program to implement function expansion and task implementation. The commerce management engine may include a series of basic functions and expose those functions through APIs to online service and/or application calls that use the corresponding functions by remotely calling the corresponding APIs.

Under the support of the various components of the commerce management engine instance, the e-commerce platform 82 may provide online shopping functionality, enabling merchants to establish contact with customers in a flexible and transparent manner, consumer users may purchase items online, create merchandise orders, provide delivery addresses for the items in the merchandise orders, and complete payment confirmation of the merchandise orders. The merchant may then review and fulfill or cancel the order. The audit component carried by the business management engine instance may enable compliance use of the business process to ensure that the order is suitable for fulfillment prior to actual fulfillment. Orders can sometimes be fraudulent, requiring verification (e.g., identification card checking), a payment method that requires the merchant to wait to ensure funds are received can act to prevent such risk, and so on. The order risk may be generated by fraud detection tools submitted by third parties through an order risk API or the like. Before fulfillment, the merchant may need to acquire payment information or wait to receive payment information in order to mark the order as paid before the merchant can prepare to deliver the product. Such as this, a corresponding examination can be made. The audit flow may be implemented by a fulfillment component. Merchants can review, adjust the job, and trigger related fulfillment services by way of fulfillment components, such as: through manual fulfillment services, use when a merchant picks and packages a product in a box, purchases a shipping label and enters its tracking number, or simply marks an item as fulfilled; a custom fulfillment service that may define sending emails for notification; an API fulfillment service that may trigger a third party application to create a fulfillment record at a third party; a legacy fulfillment service that may trigger custom API calls from a business management engine to a third party; the gift card fulfills the service. Generating a number and activating the gift card may be provided. Merchants may print shipping slips using an order printer application. The fulfillment process may be performed when the items are packaged in boxes and ready for shipment, tracking, delivery, verification by the consumer, etc.

It can be seen that the service provided by the e-commerce platform is based on the fact that products are expanded as cores, corresponding commodity data are basic data of the e-commerce platform, commodity information is provided through the commodity data, mining and utilization of the commodity data are bases for realizing various technical services, and basic services are provided for operation of a data processing system by utilizing user transaction data and commodity data in the commodity data of the e-commerce platform. Therefore, the data processing system can be operated in any one or more servers of the cluster of the e-commerce platform so as to realize various functions by utilizing various commodity data provided by the e-commerce platform.

A language model training method of the present application may be programmed as a computer program product, deployed and executed in a client or a server, for example, in an exemplary application scenario of the present application, may be deployed and executed in a server of an e-commerce customer service platform, thereby performing the method by accessing an interface that is opened after the computer program product is executed, and performing man-machine interaction with a process of the computer program product through a graphical user interface.

Referring to fig. 2, in an exemplary embodiment, the language model training method of the present application includes the following steps:

Step S1100, acquiring a training set, wherein the training set comprises a plurality of training samples and supervision labels thereof, the supervision labels are original texts, and the training samples are covering texts obtained by replacing part of texts in the original texts;

the training set is prepared in advance and is used for training the language model to be converged. In order to ensure that the language model fully understands specific terms, words and expression habits in the electronic commerce field, a plurality of electronic commerce texts can be collected, then data cleaning is carried out on the texts, and a data set is constructed by the electronic commerce texts after the data cleaning. The electronic commerce text can be any one or more of commodity description text, commodity comment text, commodity search keywords, customer service chat record text, commodity shopping guide record text, text in a knowledge base document of a help center and the like. The data cleansing includes any one or more of de-duplication, uppercase to lowercase, de-stop words, de-special and unwanted punctuation, de-shortness and/or mispronounced text, etc.

And respectively taking each business text of the database as an original text, segmenting each original text, and then replacing part of texts with preset proportions in the original text with preset probability texts to obtain a covering text, wherein the preset proportions can be set by a person skilled in the art as required, and the recommended preset proportions are 15%. The text with the pre-specified probability is 80% of characters [ MASK ], 10% of the text with the pre-specified probability is one word element randomly selected from a preset word list of the language model, and 10% of the text with the pre-specified probability is the original word element.

Taking each covering text as a training sample, taking the original text corresponding to each training sample as a supervision label of the training sample, and forming a training set by using all the training samples and the supervision labels thereof.

Step 1200, calling a single training sample in a training set and a supervision label thereof, and reasoning out a predicted text according to the training sample by a language model to determine a literal consistency loss value between the predicted text and the supervision label;

the language model may be one implementation of a neural network model applicable to the NLP (natural language processing) field using Bert, GPT, T, etc.

In one embodiment, the language model is implemented using GPT. The training sample is segmented by a segmentation algorithm to obtain a segmentation result of the training sample, namely a segmentation sequence, the segmentation sequence is input into the language model, the segmentation sequence is subjected to Embedding representation processing by a word element Embedding representation layer and a position Embedding representation layer in the language model, a word element Embedding vector (Token Embedding) and a position Embedding vector (Position Embedding) of each segmentation in the segmentation sequence are correspondingly determined, and then the two Embedding vectors are added to obtain an input vector of the segmentation. The input vector corresponding to each word in the word segmentation sequence is input to a feature extraction module with multiple layers stacked in the language model, wherein the layers can be set by a person skilled in the art according to requirements, for example, 96 layers, and each feature extraction module comprises a multi-head attention covering layer, a residual connection and normalization layer, a full connection layer, a residual connection and normalization layer which are sequentially connected. And obtaining a text feature vector corresponding to each word in the word segmentation sequence output by the last feature extraction module. Further, linear transformation operation is carried out on the text feature vectors corresponding to the word segmentation through a feedforward neural network layer, dictionary probability distribution corresponding to each position in the predicted text is obtained through a Softmax layer, and a preset sampling strategy is adopted to determine that the output text corresponding to each position forms the predicted text according to the dictionary probability distribution corresponding to each position. And the position in the predicted text is the same as the position in the training sample, and the reasoning of the dictionary probability distribution corresponding to each position in the predicted text is based on the word segmentation of all positions before the position in the training sample. And the dimensionality of each dictionary probability distribution is the total number of the word elements in a preset dictionary of the language model.

The sampling strategies include temperature sampling, greedy sampling, bundle sampling, TOP-K sampling, TOP-P sampling, repeat penalty sampling, etc., and one skilled in the art can use alternatively or in combination with more than one, the recommended sampling strategy being to use in combination with TOP-P sampling and temperature sampling.

The word segmentation algorithm can be BPE, wordPiece, jieba, THULAC, hanLP, LTP, NLTK and other technical personnel in the field can flexibly select one implementation according to requirements.

And performing word segmentation on the supervision tag by adopting a word segmentation algorithm to obtain a word segmentation result of the supervision tag, namely a word segmentation sequence, and then Encoding each word in the word segmentation sequence by adopting One-Hot Encoding to determine the unique Hot Encoding distribution corresponding to each word. And the dimension of each single-hot coding distribution is the total number of the word elements in a preset dictionary of the language model.

And calculating a loss value between the dictionary probability distribution corresponding to each output text, namely the word in the predicted text and the single-heat coding distribution of each word in the supervision tag by adopting a cross entropy loss function or a mean square error loss value, and taking the loss value as a literal consistency loss value.

Step S1300, the language model shows the predicted semantic features of the predicted text and the actual semantic features of the supervision labels, and semantic correlation loss values between the predicted semantic features and the actual semantic features are determined;

The method comprises the steps of performing word segmentation on a predicted text by adopting a word segmentation algorithm to obtain a word segmentation result of the predicted text, namely a word segmentation sequence, inputting the word segmentation sequence into a language model, performing embedded representation processing on the word segmentation sequence by a word element embedded representation layer and a position embedded representation layer in the language model, correspondingly determining a word element embedded vector and a position embedded vector of each word segment in the word segmentation sequence, and adding the two embedded vectors to obtain an input vector of the word segment. The input vector corresponding to each word in the word segmentation sequence is input to a feature extraction module of a stacked multilayer in the language model, a text feature vector corresponding to each word in the word segmentation sequence output by the last feature extraction module is obtained, the text feature vector of the predicted text is formed, and it is easy to understand that the text feature vector of the predicted text is a predicted semantic feature of the vectorized representation of the predicted text.

The supervision tag is segmented by adopting a segmentation algorithm to obtain a segmentation result of the supervision tag, namely a segmentation sequence, the segmentation sequence is input into the language model, the segmentation sequence is subjected to embedding representation processing by a word element embedding representation layer and a position embedding representation layer in the language model, a word element embedding vector and a position embedding vector of each segmentation in the segmentation sequence are correspondingly determined, and then the two embedding vectors are added to obtain an input vector of the segmentation. The input vector corresponding to each word in the word segmentation sequence is input to a feature extraction module of a stacked multilayer in the language model, a text feature vector corresponding to each word in the word segmentation sequence output by the last feature extraction module is obtained, the text feature vector of the supervision tag is formed, and it is easy to understand that the text feature vector of the supervision tag is a vectorization representation of the prediction semantic feature of the supervision tag.

Calculating the similarity between the text feature vector of the predicted text and the text feature vector of the supervision label by adopting a vector similarity algorithm, wherein the similarity is used for quantitatively representing the semantic correlation degree between the predicted text and the supervision label, and the higher the similarity is, the higher the semantic correlation degree is, and the lower the corresponding semantic difference degree is; the lower the similarity, the lower the semantic relatedness degree, and the higher the corresponding semantic difference degree, thereby taking the negative value of the similarity as the semantic relatedness loss value.

The vector similarity algorithm can be any one of cosine similarity algorithm, euclidean distance algorithm, pearson correlation coefficient algorithm, jacquard coefficient algorithm and the like.

And step 1400, updating the weight parameters of the language model according to the literal consistency loss value and the semantic relevance loss value, and calling other training samples in the training set and supervision labels thereof to carry out iterative training until the literal consistency loss value and the semantic relevance loss value meet the preset conditions, and confirming that the language model is trained to a convergence state.

In one embodiment, respective weights are set for the literal uniform loss value and the semantically related loss value, the literal uniform loss value and the semantically related loss value are multiplied by the respective weights and added, and an iterative loss value is calculated, wherein the sum of the weights of the literal uniform loss value and the semantically related loss value is 1, and specific values of the two weights can be set by a person skilled in the art as required.

When the iteration loss value is smaller than or equal to a preset threshold value, the language model is trained to a convergence state, and language model training can be terminated; when the iteration loss value is larger than a preset threshold value, the language model is indicated to be not converged, gradient updating is carried out on the language model according to the iteration loss value, the weight parameters of each link of the language model are corrected through back propagation to enable the language model to further approach convergence, and then other training samples and supervision labels thereof are continuously called to carry out iterative training on the language model until the language model is trained to a convergence state. The preset threshold may be set as desired by one skilled in the art based on the disclosure herein.

As can be appreciated from the exemplary embodiments of the present application, the technical solution of the present application has various advantages, including but not limited to the following aspects:

Referring to fig. 3, in a further embodiment, before step S1100, the training set is acquired, the method includes the following steps:

step S1000, acquiring a plurality of electric business texts, wherein the electric business comprises any one or more of similar commodity matching, commodity comment, customer service and commodity shopping guide;

to ensure that the language model adequately understands specific terms, words and expression habits of the e-commerce domain, a plurality of e-commerce texts are collected. For the e-commerce service to be similar commodity matching, the corresponding e-commerce text is a text based on similar commodity matching, usually a commodity description text, and the commodity description text can be any one or more of commodity titles, commodity labels, commodity categories, product attributes, commodity detail texts and the like; for the commodity comments, the corresponding commodity text is an evaluation text provided by a user who purchases the commodity for evaluating the commodity; for the E-commerce service to be customer service, the corresponding E-commerce service text is a chat record text between a user consulting the customer service and the customer service, and comprises a question text of the user and a reply text of the customer service; and for the E-commerce service to be commodity shopping guide, the corresponding E-commerce text is a chat record text between a shopping guide guiding a user to purchase and the user, and comprises a question text of the user and a reply text of the shopping guide.

Step S1010, data cleaning is carried out on the plurality of business texts of the electric motor, and a data set is built by the business texts of the electric motor after data cleaning;

the data cleansing includes any one or more of de-duplication, uppercase to lowercase, de-stop words, de-special and unwanted punctuation, de-shortness and/or mispronounced text, etc. For removing special symbols and unnecessary punctuation marks, the method can be realized through regular expressions, character string matching, filtering and other technologies. For removing stop words, texts in the stop word list existing in the text of the business of the electric motor can be regarded as stop words by loading the stop word list of an open source, the stop word list of the open source can be a Hadamard stop word list, a hundred-degree stop word list, a Chuanda intelligent laboratory stop word list and the like. All the business texts of the electric appliances after data cleaning form a data set.

Step S1020, each business text in the data set is used as an original text, and a part of texts with preset proportions in the original text are replaced by pre-specified probability texts, so that a covering text is obtained;

in order to replace a part of text with a preset proportion in each original text, the preset proportion can be set by a person skilled in the art according to needs, for example, 15%, in this regard, in one embodiment, for each original text, a word segmentation algorithm is adopted to segment the original text, a word segmentation result, namely a word segmentation sequence, of the original text is obtained, the probability that each word in the word segmentation sequence is replaced is set to be the preset proportion, namely, the probability that each word in the word segmentation sequence is replaced is equivalent to the part of text with the preset proportion in the word segmentation sequence to a certain extent, when the word hits the preset proportion, the word is replaced to be a [ MASK ] character according to 80%, the word is replaced to be a word element randomly selected from a preset word list of the language model according to 10%, and the word is replaced to be the word according to 10% probability; and when the word segmentation is not hit in the preset proportion, not replacing the word segmentation. Thereby, a masking text corresponding to the original text is obtained.

Step S1030, constructing a training set by using all original texts and covering texts thereof.

In this embodiment, a process of constructing a training set is disclosed, which can ensure that a language model in which the training set is invoked to train to a convergence state can be fully understood, and specific terms, words and expression habits in the e-commerce field can be fully understood.

Referring to fig. 4, in a further embodiment, step S1200, reasoning out a predicted text from a language model according to a training sample includes the following steps:

step S1210, predicting dictionary probability distribution corresponding to all covered positions in the training sample by the language model;

in one embodiment, the language model is implemented using Bert. The training sample is segmented by a segmentation algorithm to obtain a segmentation result of the training sample, namely a segmentation sequence, the segmentation sequence is input into the language model, the segmentation sequence is subjected to Embedding representation processing by a word element Embedding representation layer, a position Embedding representation layer and a paragraph Embedding representation layer in the language model, a word element Embedding vector (Token Embedding vector), a position Embedding vector (Position Embedding) and a paragraph Embedding vector (Segment Embedding) of each segmentation in the segmentation sequence are correspondingly determined, and then three Embedding vectors are added to obtain an input vector of the segmentation. The input vector for each word in the sequence of words is input to the multi-layered stack Transformer Encoder in the language model, which number of layers can be set by one skilled in the art as desired, e.g., 12 layers or 24 layers. For the first Transformer Encoder, the input vector corresponding to each word segment is subjected to multi-head attention calculation when passing through the multi-head attention layer, so that the self-attention weighting of different dimensions is performed on each word segment, and a corresponding weighted vector is obtained. And adding the input vector corresponding to each word segment with the weighted vector through residual connection to obtain a corresponding first modified vector. And (3) carrying out 0-mean 1-variance normalization operation on the modified vector corresponding to each word segment when passing through the normalization layer, and obtaining a corresponding first normalization vector. When the two linear layers are passed, the first layer respectively carries out linear transformation operation on the first standardized vector corresponding to each word segment, maps the first standardized vector corresponding to each word segment to a feature space with larger dimension, then uses the ReLU to introduce nonlinear control, then the last layer carries out linear transformation operation on the first standardized vector corresponding to each word segment, maps the first standardized vector corresponding to each word segment back to the feature space with original dimension, and obtains the telescopic vector corresponding to each word segment. And adding the first standard vector corresponding to each word segment with the telescopic vector through residual connection to obtain a corresponding second modified vector. And (3) carrying out 0-mean 1-variance standardization operation on the second modified vector corresponding to each word segment when passing through the standardization layer, and obtaining the text feature vector corresponding to each word segment. Accordingly, the word segmentation sequence passes through the multi-layer stack Transformer Encoder, and the text feature vector corresponding to each word in the word segmentation sequence output by the last Transformer Encoder is obtained. Further, linear transformation operation is carried out on the text feature vectors corresponding to the segmentation through a feedforward neural network layer, and then dictionary probability distribution corresponding to each covered position is obtained through a Softmax layer.

It may be appreciated that, for a word sequence of an input text input into the language model, the language model may be obtained to infer according to the word sequence, and a text feature vector corresponding to each word in the word sequence output by the last Transformer Encoder word in the model may form a text feature vector of the input text, where the text feature vector is a semantic feature that is vectorized to represent the input text.

Step S1220, determining an output text according to dictionary probability distribution corresponding to each covered position by adopting a preset sampling strategy;

and the sampling strategy is greedy sampling, namely, determining a word element identifier corresponding to the maximum probability in dictionary probability distribution for the dictionary probability distribution corresponding to each covered position, and selecting a word element corresponding to the word element identifier in a preset dictionary of the language model as an output text corresponding to the covered position.

Step S1230, each output text is replaced with the current text of the covered position of the output text in the training sample, and a predicted text is obtained.

And replacing the output text with the current text of the covered position in the training sample according to the corresponding covered position of each output text in the training sample, thereby obtaining the predicted text.

In this embodiment, the dictionary probability distribution corresponding to all covered positions in the training sample is inferred by the language model according to the training sample, the output text corresponding to each dictionary probability distribution is determined by adopting a sampling strategy, and the current text of the covered position in the training sample is replaced correspondingly according to each output text, so as to obtain the predicted text.

Referring to fig. 5, in a further embodiment, after step S1400 of confirming that the language model is trained to the convergence state, the method includes the following steps:

step S1500, a first fine tuning training set is obtained, wherein the first fine tuning training set comprises a plurality of question training samples and answer supervision labels thereof, the question training samples are question texts and related knowledge documents, and the answer supervision labels are answer texts of the question texts in the knowledge documents;

and preparing the first fine tuning training set in advance for fine tuning the training language model to convergence.

It will be appreciated that e-commerce platforms typically provide functionality such as a help center that can be used by users in the e-commerce platform to output question text in the help center and submit the question text to a server of the e-commerce platform. The server can search out the knowledge documents related to the problem text by calling a search engine after receiving the problem text, and push the knowledge documents to the user. Further, the user may review the knowledge document from which answer text to the question text is obtained. Thus, a plurality of question texts submitted by users in the e-commerce platform are collected, and knowledge documents searched by a search engine are searched for by each question text.

And taking each question text and the knowledge document thereof as a single question training sample, and manually determining the answer text of the question text of each question training sample in the knowledge document thereof as an answer supervision label of the question training sample, so that a first fine tuning training set is formed by all the question training samples and the answer supervision labels thereof.

Step S1510, invoking a single question training sample and an answer supervision label thereof in the first fine tuning training set, and reasoning out a predicted answer text by a language model according to the question training sample to determine a position consistency loss value between the predicted answer text and the answer supervision label;

in one embodiment, the language model is implemented by using Bert, and the two sets of vectors are a start feature vector and an end feature vector, respectively, which are randomly initialized. And performing word segmentation on the problem training sample by adopting a word segmentation algorithm to obtain a word segmentation result of the problem training sample, namely a word segmentation sequence, inputting the word segmentation sequence into the language model for forward reasoning, and obtaining a text feature vector corresponding to each word segmentation in the word segmentation sequence output by the last layer Transformer Encoder. And carrying out inner product on the initial characteristic vector and the text characteristic vector corresponding to each word segment belonging to the knowledge document in the word segment sequence to correspondingly obtain an initial position vector corresponding to each word segment belonging to the knowledge document, and carrying out inner product on the end characteristic vector and the text characteristic vector corresponding to each word segment belonging to the knowledge document in the word segment sequence to correspondingly obtain an end position vector corresponding to each word segment belonging to the knowledge document. Further, performing linear transformation operation on the initial position vector corresponding to each word through a feedforward neural network layer, obtaining initial position distribution corresponding to the answer text of the question training sample in a knowledge document of the question training sample through a Softmax layer, performing linear transformation operation on the end position vector corresponding to each word through the feedforward neural network layer, and obtaining end position distribution corresponding to the answer text of the question training sample in the knowledge document of the question training sample through the Softmax layer. And the dimension of the initial position distribution and the final position distribution is the total number of the word segmentation in the knowledge document.

And determining the initial position of the maximum probability in the initial position distribution and the final position of the maximum probability in the final position distribution by adopting greedy sampling, and selecting a corresponding part of text from the knowledge document of the question training sample according to the initial position and the final position to serve as a predicted answer text.

And performing word segmentation on the answer supervision label by adopting a word segmentation algorithm to obtain a word segmentation result, namely a word segmentation sequence, of the answer supervision label, determining the position of a first word in the word segmentation sequence in a knowledge document of the question training sample as an actual starting position, determining the position of a last word in the word segmentation sequence in the knowledge document of the question training sample as an actual ending position, and then adopting One-Hot Encoding to encode the actual starting position and the actual ending position respectively to correspondingly determine a starting single-Hot Encoding distribution and an ending single-Hot Encoding distribution. And the dimensionality of each of the initial single-heat coding distribution and the ending single-heat coding distribution is the total number of the word elements in the knowledge document.

And calculating loss values corresponding to the starting position distribution and the ending position distribution and between the starting single-heat coding distribution and the ending single-heat coding distribution by adopting a cross entropy loss function or a mean square error loss value, and taking the loss values as position consistency loss values.

Step S1520, updating the weight parameters of the language model according to the position consistency loss value, and calling other question training samples in the first fine tuning training set and answer supervision labels thereof to carry out iterative training until the position consistency loss value meets the preset condition, and confirming that the language model is fine-tuned to be in a convergence state as a question-answering language model.

When the position consistency loss value is smaller than or equal to a preset threshold value, indicating that the language model is already subjected to fine tuning training to a convergence state, and terminating the fine tuning training of the language model; when the position consistency loss value is larger than a preset threshold value, the language model is indicated to be not converged, gradient updating is carried out on the language model according to the position consistency loss value, the weight parameters of each link of the language model are corrected through back propagation to enable the language model to further approach convergence, then, other question training samples and answer supervision labels thereof are continuously called to carry out iterative fine tuning training on the language model until the language model is subjected to fine tuning training to be in a convergence state, and the language model is used as a question-answer language model. The preset threshold may be set as desired by one skilled in the art based on the disclosure herein.

In this embodiment, the first fine tuning training set is invoked to fine tune the training language model to a convergence state, so as to serve as a question-answering language model, and the question-answering language model is enabled to learn the ability of determining answer text from knowledge documents related to the question text according to the question text, so that the answer text can be ensured to accurately answer the question text.

Referring to fig. 6, in a further embodiment, step S1520, after confirming that the language model is fine-tuned to a convergence state, includes the following steps:

step S1521, responding to the answer display event, and acquiring a question text and a related knowledge document thereof;

the user may enter the question text in the help center and submit to the server using the functionality of the help center of the e-commerce platform. And the server receives the problem text, calls a search engine to search out a knowledge document related to the problem text, and pushes the knowledge document to the user. And when the user receives the knowledge document and refers to the knowledge document, popping up an input box on a visual page of the current reference knowledge document so as to allow the user to input new problem text or follow the problem text input in the help center before, and submitting the corresponding problem text and the knowledge document to a server. The server receives the question text and the knowledge document in response to an answer display event.

Step S1522, determining answer text of the question text in the knowledge document by adopting the question-answer language model;

and performing word segmentation on the question text and the knowledge document by using a word segmentation algorithm to obtain word segmentation results, namely word segmentation sequences, of the question text and the knowledge document, inputting the word segmentation sequences into the question-answer language model to perform forward reasoning to obtain initial position distribution and end position distribution output by the question-answer language model, determining the initial position with the maximum probability in the initial position distribution and the end position with the maximum probability in the end position distribution by using greedy sampling, and selecting corresponding partial texts from the knowledge document to serve as answer texts according to the initial position and the end position.

Step S1523, the current visual area displaying the knowledge document is controlled to scroll to display answer text.

And determining the position of the visual page of the current consulting knowledge document according to the answer text, and taking the position as a target position. And controlling the current visual area in the visual page to scroll to the target position so as to display the answer text in the visual area.

In the embodiment, the answer text of the question text in the knowledge document is determined by responding to the answer display event and adopting the question-answer language model, so that the answer text is displayed in the visible area, the answer of the question can be accurately solved in the knowledge document for the user, the time cost of the user is greatly saved, and the user experience is improved.

Referring to fig. 7, in a further embodiment, after step S1400 of confirming that the language model is trained to the convergence state, the method includes the following steps:

step 1600, acquiring a second fine tuning training set, wherein the second fine tuning training set comprises a plurality of voice training samples and key supervision labels thereof, the voice training samples are voice texts, and the key supervision labels are keywords in the voice texts;

and preparing the second fine tuning training set in advance for fine tuning the training language model to convergence.

It can be understood that the e-commerce platform can provide the functions of commodity searching and/or commodity shopping guide, and the user can describe the requirement of the user on the wanted commodity in a voice mode in the using process of the functions, and then submit corresponding voice to the server. The server will convert the speech to speech text. Thus, a plurality of voice texts converted from voice input by a user are collected during commodity searching and/or commodity shopping guide. For each phonetic text, the keywords in the phonetic text are determined manually.

And respectively taking each voice text as a single voice training sample, and labeling keywords in each voice training sample as key supervision labels of the voice training samples. And forming a second fine tuning training set by using all the voice training samples and the key supervision labels thereof.

Step S1610, calling a single voice training sample and a key supervision label thereof in a second fine tuning training set, and reasoning out a predicted keyword according to the voice training sample by a language model to determine a key probability loss value between the predicted keyword and the key supervision label;

in one embodiment, the language model is implemented using Bert. And performing word segmentation on the voice training sample by adopting a word segmentation algorithm to obtain a word segmentation result, namely a word segmentation sequence, of the voice training sample, inputting the word segmentation sequence into the language model for forward reasoning, and obtaining a text feature vector corresponding to each word in the word segmentation sequence output by the last layer Transformer Encoder. Further, a linear transformation operation is carried out on the text feature vector corresponding to each word segment through a feedforward neural network layer, then a keyword probability distribution corresponding to each word segment is obtained through the Softmax layer, and the dimensionality of each keyword probability distribution is 2 and comprises the probability of belonging to the keyword category and the probability of not belonging to the keyword category. Determining the category with the highest probability in the keyword probability distribution corresponding to each word by adopting greedy sampling, and if the category belongs to the keyword category, determining the word as a predicted keyword; when the category is not a keyword category, the term is not a predicted keyword.

And (3) performing word segmentation on the key supervision tag by adopting a word segmentation algorithm to obtain a word segmentation result of the key supervision tag, namely a word segmentation sequence, and Encoding the word segmentation sequence by adopting One-Hot Encoding to determine the key single-Hot Encoding distribution corresponding to each word in the word segmentation sequence, wherein the dimension of each two-part single-Hot Encoding distribution is 2, the probability of belonging to the keyword category is 1, and the probability of not belonging to the keyword category is 0. And setting the same non-key single-hot coding distribution for all the other words except the word in the word sequence of the key supervision tag in the word sequence of the voice training sample, wherein the dimension of each non-key single-hot coding distribution is 2, the probability of belonging to the keyword category is 0, and the probability of not belonging to the keyword category is 1.

And calculating the keyword probability distribution corresponding to each word in the word sequence of the voice training sample by adopting a cross entropy loss function or a mean square error loss value, and correspondingly calculating the key single-hot coding distribution corresponding to each word in the word sequence of the key supervision tag according to a one-to-one position relationship, wherein the loss value between the non-key single-hot coding distribution corresponding to each word except the word in the word sequence of the key supervision tag in the word sequence of the voice training sample is used as the key probability loss value. For easy understanding of the exemplary example, the keyword probability distribution corresponding to each word in the word segmentation sequence of the training sample is Position1_token: [0.1,0.9], position2_token: [0.8,0.2], position3_token: [0.3,0.7], position4_token: [0.15,0.75], position5_token: [0.35,0.64] the key unique thermal coding distribution corresponding to each word in the word sequence of the key supervision tag is Position2_token: [1,0], wherein the non-key independent thermal coding distribution corresponding to each word except the word in the word sequence of the key supervision tag in the word sequence of the voice training sample is Position1_Token respectively: [0,1], position3_token: [0,1], position4_token: [0,1], position5_token: [0,1].

Step S1620, updating the weight parameters of the language model according to the key probability loss value, and calling other voice training samples in the second fine tuning training set and key supervision labels thereof to carry out iterative training until the key probability loss value meets the preset condition, and confirming that the language model is fine-tuned to be in a convergence state as a keyword extraction model.

When the key probability loss value is smaller than or equal to a preset threshold value, indicating that the language model is already subjected to fine tuning training to a convergence state, and terminating the fine tuning training of the language model; when the key probability loss value is larger than a preset threshold value, the language model is indicated to be not converged, gradient updating is carried out on the language model according to the key probability loss value, the weight parameters of each link of the language model are corrected through back propagation to enable the language model to further approach convergence, then, other voice training samples and key supervision labels thereof are continuously called to carry out iterative fine tuning training on the language model until the language model is subjected to fine tuning training to a convergence state, and the language model is used as a keyword extraction model. The preset threshold may be set as desired by one skilled in the art based on the disclosure herein.

In this embodiment, the second fine tuning training set is invoked to fine tune the training language model to a convergence state, so as to serve as a keyword extraction model, thereby obtaining the capability of determining the keywords in the voice text, and ensuring the accuracy and reliability of the keywords determined by the keyword extraction model.

Referring to fig. 8, in a further embodiment, step S1620, after confirming that the language model is fine-tuned to be in a convergence state and is used as a keyword extraction model, includes the following steps:

step S1621, responding to the voice searching event, and obtaining a voice text obtained by recognizing the voice of the user by adopting a preset voice model;

the user of the e-commerce platform can describe the demand of the user for the wanted commodity in a voice mode by using the commodity searching and/or commodity shopping guiding functions, and then the correspondingly generated voice is submitted to the server as user voice.

The server responds to the voice search event, receives the user voice, and can invoke the voice model trained in advance to be converged to recognize the voice text corresponding to the user voice. The speech model is trained to a convergence state in advance, and the capability of converting the speech into the corresponding text is obtained by adopting an open-source speech model such as Whisper and the like, and can be obtained by self-training in the technical field.

Step S1622, determining keywords in the voice text by adopting the keyword extraction model;

and performing word segmentation on the voice training sample by adopting a word segmentation algorithm to obtain a word segmentation result, namely a word segmentation sequence, of the voice training sample, inputting the word segmentation sequence into the language model for forward reasoning, and obtaining a text feature vector corresponding to each word in the word segmentation sequence output by the last layer Transformer Encoder. Further, performing linear transformation operation on the text feature vector corresponding to each word segment through a feedforward neural network layer, obtaining keyword probability distribution corresponding to each word segment through the Softmax layer, determining the category of the maximum probability in the keyword probability distribution corresponding to each word segment through greedy sampling, and if the category belongs to the keyword category, determining the word segment as a keyword; when the category is not a keyword category, the term is not a keyword.

Step S1623, calling a commodity searching interface, and transmitting the keywords to the commodity searching interface so as to drive the commodity searching interface to search out commodity results according to the keywords.

The commodity search interface can be realized by adopting a search engine or can be realized by the person skilled in the art according to the requirements.

And after receiving the keyword, the commodity searching interface returns a corresponding commodity result according to the keyword, wherein the commodity result can comprise a commodity description text and/or a commodity picture of the corresponding commodity, and the commodity picture can be a selling point for displaying the commodity or describing the commodity in a picture form. Further, submitting the commodity result to a server.

And after receiving the commodity result, the server pushes the commodity result to the user.

In the embodiment, the voice text obtained by recognizing the voice of the user by adopting the voice model is obtained by responding to the voice search event, and then the keyword in the voice text is determined by adopting the keyword extraction model, so that the commodity search interface is called to search the commodity result according to the keyword, the keyword in the voice text can be determined efficiently and accurately, the user does not need to self-comb the keyword, and the user experience is increased.

Referring to fig. 9, a language model training apparatus provided for one of the purposes of the present application is a functional implementation of a language model training method of the present application, and on the other hand, the apparatus is a language model training apparatus provided for one of the purposes of the present application, and on the other hand, a language model training apparatus provided for one of the purposes of the present application includes a training set obtaining module 1100, a first loss value module 1200, a second loss value module 1300, and an iterative training module 1400, where the training set obtaining module 1100 is configured to obtain a training set, the training set includes a plurality of training samples and a supervision tag thereof, the supervision tag is an original text, and the training samples are covering texts obtained by replacing a part of texts in the original text; the first loss value module 1200 is configured to invoke a single training sample in the training set and a supervision tag thereof, and infer a predicted text from the training sample by using the language model, so as to determine a literal consistency loss value between the predicted text and the supervision tag; a second loss value module 1300, configured to represent, by the language model, a predicted semantic feature of the predicted text and an actual semantic feature of the supervision tag, and determine a semantic correlation loss value between the predicted semantic feature and the actual semantic feature; the iterative training module 1400 is configured to update the weight parameters of the language model according to the literal consistency loss value and the semantic relevance loss value, and invoke other training samples in the training set and the supervision labels thereof to perform iterative training until the literal consistency loss value and the semantic relevance loss value meet a preset condition, and confirm that the language model is trained to a convergence state.

In a further embodiment, before the training set obtaining module 1100, the method includes: the text acquisition sub-module is used for acquiring a plurality of electric business texts, wherein the electric business comprises any one or more of similar commodity matching, commodity comments, customer service and commodity shopping guide; the data set construction sub-module is used for carrying out data cleaning on the plurality of business texts and constructing a data set by using the business texts after data cleaning; the text construction sub-module is used for respectively taking each business text in the data set as an original text, and replacing part of texts with preset proportions in the original text with a pre-designated probability text to obtain a covering text; the training set construction sub-module is used for constructing a training set by using all original texts and all covering texts thereof.

In a further embodiment, the first loss value module 1200 includes: the distribution prediction sub-module is used for predicting dictionary probability distribution corresponding to all covered positions in the training sample by the language model; the output text determining submodule is used for determining an output text according to dictionary probability distribution corresponding to each covered position by adopting a preset sampling strategy; and the predictive text determination submodule is used for replacing the current text of the covered position of the output text in the training sample with each output text to obtain the predictive text.

In a further embodiment, after the iterative training module 1400, it includes: the first training set acquisition sub-module is used for acquiring a first fine tuning training set, wherein the first fine tuning training set comprises a plurality of question training samples and answer supervision labels thereof, the question training samples are question texts and related knowledge documents, and the answer supervision labels are answer texts of the question texts in the knowledge documents; the first training set invoking sub-module is used for invoking a single question training sample and an answer supervision label thereof in the first fine tuning training set, reasoning out a predicted answer text according to the question training sample by the language model, and determining a position consistency loss value between the predicted answer text and the answer supervision label; and the first iterative training sub-module is used for updating the weight parameters of the language model according to the position consistency loss value, calling other question training samples in the first fine tuning training set and answer supervision labels thereof to carry out iterative training until the position consistency loss value meets the preset condition, and confirming that the language model is fine-tuned to be in a convergence state to be used as a question-answering language model.

In a further embodiment, after the iterative training module 1400, it includes: the second training set acquisition sub-module is used for acquiring a second fine tuning training set, wherein the second fine tuning training set comprises a plurality of voice training samples and key supervision labels thereof, the voice training samples are voice texts, and the key supervision labels are keywords in the voice texts; the second training set calling sub-module is used for calling a single voice training sample and a key supervision label thereof in the second fine tuning training set, reasoning out predicted keywords according to the voice training sample by the language model, and determining a key probability loss value between the predicted keywords and the key supervision label; and the second iterative training sub-module is used for updating the weight parameters of the language model according to the key probability loss value, calling other voice training samples in the second fine tuning training set and the key supervision labels thereof to carry out iterative training until the key probability loss value meets the preset condition, and confirming that the language model is fine-tuned to be trained to a convergence state as a keyword extraction model.

In order to solve the technical problems, the embodiment of the application also provides computer equipment. As shown in fig. 10, the internal structure of the computer device is schematically shown. The computer device includes a processor, a computer readable storage medium, a memory, and a network interface connected by a system bus. The computer readable storage medium of the computer device stores an operating system, a database and computer readable instructions, the database can store a control information sequence, and the computer readable instructions when executed by a processor can enable the processor to realize a language model training method. The processor of the computer device is used to provide computing and control capabilities, supporting the operation of the entire computer device. The memory of the computer device may have stored therein computer readable instructions that, when executed by the processor, cause the processor to perform the language model training method of the present application. The network interface of the computer device is for communicating with a terminal connection. It will be appreciated by those skilled in the art that the structure shown in fig. 10 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

The processor in this embodiment is configured to execute specific functions of each module and its sub-modules in fig. 9, and the memory stores program codes and various types of data required for executing the above modules or sub-modules. The network interface is used for data transmission between the user terminal or the server. The memory in this embodiment stores program codes and data required for executing all modules/sub-modules in the language model training apparatus of the present application, and the server can call the program codes and data of the server to execute the functions of all sub-modules.

The present application also provides a storage medium storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of the language model training method of any embodiment of the present application.

Those skilled in the art will appreciate that implementing all or part of the above-described methods of embodiments of the present application may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed, may comprise the steps of embodiments of the methods described above. The storage medium may be a computer readable storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).

In summary, the method and the device ensure that the language model can accurately understand the semantics of the text, and further fine tune the language model to be trained to a convergence state, so that the language model is suitable for various application scenes, and service capable of improving user experience is provided.

Those of skill in the art will appreciate that the various operations, methods, steps in the flow, actions, schemes, and alternatives discussed in the present application may be alternated, altered, combined, or eliminated. Further, other steps, means, or steps in a process having various operations, methods, or procedures discussed in this application may be alternated, altered, rearranged, split, combined, or eliminated. Further, various operations, methods, steps, means, or arrangements of procedures found in the prior art with open sources in this application may also be alternated, altered, rearranged, split, combined, or deleted.

The foregoing is only a partial embodiment of the present application, and it should be noted that, for a person skilled in the art, several improvements and modifications can be made without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims

1. A method for training a language model, comprising the steps of:

2. The language model training method of claim 1, comprising the steps of, prior to obtaining the training set:

the training set is constructed with all the original text and its cover text.

3. The language model training method of claim 1, wherein the predicting text is inferred from the training samples by the language model, comprising the steps of:

4. The language model training method of claim 1, wherein after confirming that the language model is trained to a converged state, comprising the steps of:

5. The language model training method according to claim 4, wherein after confirming that the language model is fine-tuned to a converged state as a question-answering language model, comprising the steps of:

6. The language model training method of claim 1, wherein after confirming that the language model is trained to a converged state, comprising the steps of:

7. The language model training method according to claim 6, wherein after confirming that the language model is fine-tuned to a converged state as the keyword extraction model, comprising the steps of:

8. A language model training apparatus, comprising:

the training set acquisition module is used for acquiring a training set, wherein the training set comprises a plurality of training samples and supervision labels thereof, the supervision labels are original texts, and the training samples are covering texts obtained by replacing part of texts in the original texts;

the first loss value module is used for calling a single training sample in the training set and a supervision label thereof, reasoning out a predicted text according to the training sample by the language model, and determining a literal consistency loss value between the predicted text and the supervision label;

the second loss value module is used for expressing the predicted semantic features of the predicted text and the actual semantic features of the supervision labels by the language model and determining semantic correlation loss values between the predicted semantic features and the actual semantic features;

And the iterative training module is used for updating the weight parameters of the language model according to the literal consistency loss value and the semantic relevance loss value, calling other training samples in the training set and the supervision labels thereof to carry out iterative training until the literal consistency loss value and the semantic relevance loss value meet the preset condition, and confirming that the language model is trained to a convergence state.

9. A computer device comprising a central processor and a memory, characterized in that the central processor is arranged to invoke a computer program stored in the memory for performing the steps of the method according to any of claims 1 to 7.

10. A computer-readable storage medium, characterized in that it stores in the form of computer-readable instructions a computer program implemented according to the method of any one of claims 1 to 7, which, when invoked by a computer, performs the steps comprised by the corresponding method.