CN112560499B

CN112560499B - Pre-training method and device for semantic representation model, electronic equipment and storage medium

Info

Publication number: CN112560499B
Application number: CN202011463938.9A
Authority: CN
Inventors: 丁思宇; 王硕寰; 尚骏远; 孙宇; �田�浩; 吴华; 王海峰
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-12-11
Filing date: 2020-12-11
Publication date: 2024-01-09
Anticipated expiration: 2040-12-11
Also published as: CN112560499A

Abstract

The application discloses a pre-training method, device, electronic equipment and storage medium of a semantic representation model, and relates to the technical fields of artificial intelligence such as the technical field of deep learning, the technical field of Natural Language Processing (NLP) and the like. The specific implementation scheme is as follows: acquiring an disordered fragment sequence of a sample text, wherein the original ordering sequence of N fragments in the disordered fragment sequence in the sample text; inputting a semantic fusion vector of the ith-1 fragment and the ith fragment in the disordered fragment sequence aiming at the ith fragment in the disordered fragment sequence, and inputting a semantic representation model to obtain the semantic fusion vector of the ith fragment; inputting the semantic fusion vector of the N segments into a prediction model to generate a prediction ordering sequence of the N segments in the sample text; the semantic representation model and the prediction model are pre-trained according to the original ordering sequence and the prediction ordering sequence, so that the whole sample text can be processed, global information of the sample text is learned, and the processing efficiency of the semantic representation model is improved.

Description

Pre-training method and device for semantic representation model, electronic equipment and storage medium

Technical Field

The application relates to the technical field of computers, in particular to the technical field of artificial intelligence such as the technical field of deep learning, the technical field of Natural Language Processing (NLP), and the like, and especially relates to a pre-training method and device of a semantic representation model, electronic equipment and a storage medium.

Background

The existing pre-training method of the semantic representation model mainly comprises the steps of extracting N natural sentences from chapters to construct a text with the total length smaller than 512 words, dividing the text into a plurality of fragments, randomly disturbing the fragments, inputting the fragments into an ERNIE2.0 model, and adjusting model coefficients based on a fragment sequence prediction result to achieve pre-training.

In the method, the input is limited to the text with the total length smaller than 512, so that the model obtained by pre-training can only learn the local information of the chapter, and the processing efficiency is poor.

Disclosure of Invention

The disclosure provides a pre-training method, device and equipment for a semantic representation model and a storage medium.

According to an aspect of the present disclosure, there is provided a pre-training method of a semantic representation model, including: acquiring an disordered fragment sequence of a sample text and an original ordering sequence of N fragments in the disordered fragment sequence in the sample text, wherein N is a positive integer; inputting a semantic fusion vector of an ith fragment and the ith fragment in the disordered fragment sequence aiming at the ith fragment in the disordered fragment sequence, inputting a semantic representation model to obtain the semantic fusion vector of the ith fragment, wherein i is a positive integer less than or equal to N, and repeating the steps until the semantic fusion vector of the Nth fragment is obtained; inputting the semantic fusion vector of the N segments into a prediction model to generate a prediction ordering sequence of the N segments in the sample text; and pre-training the semantic representation model and the predictive model according to the original ordering order and the predictive ordering order.

According to another aspect of the present disclosure, there is provided a pre-training apparatus of a semantic representation model, including: the acquisition module is used for acquiring an disordered fragment sequence of the sample text and an original ordering sequence of N fragments in the disordered fragment sequence in the sample text, wherein N is a positive integer; the first input module is used for inputting a semantic fusion vector of an ith fragment and an ith fragment in the disordered fragment sequence aiming at the ith fragment in the disordered fragment sequence, and inputting a semantic representation model to acquire the semantic fusion vector of the ith fragment, wherein i is a positive integer less than or equal to N, and the steps are repeated until the semantic fusion vector of the Nth fragment is acquired; the second input module is used for inputting the semantic fusion vector of the N segments into a prediction model to generate a prediction ordering sequence of the N segments in the sample text; and the pre-training module is used for pre-training the semantic representation model and the prediction model according to the original ordering sequence and the prediction ordering sequence.

According to a third aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a pre-training method of the semantic representation model as described above.

According to a fourth aspect, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform a pre-training method of a semantic representation model as described above.

According to a fifth aspect, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a pre-training method of a semantic representation model as described above.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for better understanding of the present solution and do not constitute a limitation of the present application. Wherein:

FIG. 1 is a schematic diagram according to a first embodiment of the present application;

FIG. 2 is a schematic diagram of a semantic representation model+prediction model;

FIG. 3 is a schematic diagram according to a second embodiment of the present application;

FIG. 4 is a schematic diagram according to a third embodiment of the present application;

FIG. 5 is a block diagram of an electronic device for implementing a pre-training method for a semantic representation model according to an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The following describes a pre-training method, a pre-training device, an electronic device and a storage medium of a semantic representation model according to an embodiment of the application with reference to the accompanying drawings.

Fig. 1 is a schematic diagram according to a first embodiment of the present application. It should be noted that, the execution body in the embodiment of the present application is a pre-training device of a semantic representation model, and the pre-training device of the semantic representation model may specifically be a hardware device, or software in the hardware device, etc.

As shown in fig. 1, the specific implementation process of the pre-training method of the semantic representation model is as follows:

step 101, acquiring an disordered fragment sequence of a sample text and an original ordering sequence of N fragments in the disordered fragment sequence in the sample text, wherein N is a positive integer.

In the embodiment of the application, the sample text may be any text obtained in any manner. The acquiring mode of the out-of-order segment sequence of the sample text can be manual or automatic. In order to improve the efficiency of acquiring the out-of-order segment sequences of the sample text and reduce the acquisition cost, the pre-training device of the semantic representation model may perform the process of step 101, for example, by acquiring the sample text; dividing the sample text into N fragments; carrying out disorder treatment on the N fragments to generate a disorder fragment sequence of the sample text; and acquiring the original ordering order of the N fragments in the sample text in the disordered fragment sequence.

In the embodiment of the present application, the division basis of the segments may be, for example, division by sentence, division by paragraph, or the like. In order to reduce the number of fragments obtained by division and ensure the semantic integrity of the fragments obtained by division, the fragments may include: at least one sentence. Correspondingly, the process of dividing the sample text into N segments by the pre-training device of the semantic representation model may be, for example, dividing the sample text into N segments according to the sentence ending symbol in the sample text and the limited number of characters of the semantic representation model.

In the embodiment of the present application, when the number of characters in the sample text is larger, for example, is greater than a preset multiple of the number of limiting characters, in order to ensure that the number of characters in the divided fragments is less than or equal to the number of limiting characters, a piece of text in the sample text may be firstly intercepted according to the number of limiting characters, whether the end of the piece of text is an ending symbol is judged, if yes, the piece of text is determined to be a fragment; if not, the last ending symbol in the text segment is obtained, and all the texts before the last ending symbol in the text segment are combined into a segment. Then, starting from the ending symbol, intercepting the next text in the sample text according to the limited character quantity, and repeating the processing until the characters in the sample text are processed.

In the embodiment of the present application, when the number of characters in the sample text is smaller, for example, less than or equal to a preset multiple of the limited number of characters, the sample text may be directly segmented according to the sentence ending symbol in the sample text, so as to obtain N segments.

In the embodiment of the present application, the original ordering order of the N fragments in the sample text in the out-of-order fragment sequence may refer to the sequence numbers of the N fragments in the sample text. For example, when the sample text includes 4 fragments, A, B, C, D, the sequence of the out-of-order fragments includes: B. c, A, D, the original ordering order of the N fragments in the sample text in the out-of-order fragment sequence may be 1, 2, 0, 3.

102, inputting a semantic fusion vector of an i-1 th fragment and the i-th fragment in the disordered fragment sequence to a semantic representation model to obtain the semantic fusion vector of the i-th fragment, wherein i is a positive integer less than or equal to N, and repeating the steps until the semantic fusion vector of the N-th fragment is obtained.

In the embodiment of the application, the semantic fusion vector of the ith-1 fragment refers to a vector obtained by fusing the semantic vector of the ith-1 fragment in the disordered fragment sequence with the semantic vectors of all fragments before the ith-1 fragment in the disordered fragment sequence. For example, taking i as 4 as an example, the semantic fusion vector of the 3 rd segment is a vector obtained by fusing the semantic vector of the 1 st segment, the semantic vector of the 2 nd segment and the semantic vector of the 3 rd segment in the disordered segment sequence.

In the embodiment of the present application, the manner in which the semantic representation model obtains the semantic fusion vector of the ith segment may be, for example, to obtain the semantic vector of the ith segment; the semantic vector of the ith fragment and the semantic fusion vector of the (i-1) th fragment are fused to obtain the semantic fusion vector of the ith fragment, so that the semantic fusion vector obtained by fusion of the semantic vectors of a plurality of fragments can be obtained, and global information of a text sample is extracted. The semantic representation model may be, for example, a semantic representation model based on a transducer-XL architecture. The semantic representation model based on the transducer-XL architecture can be used for sequentially processing the first N-1 fragments in the disordered fragment sequence when reordering tasks, acquiring the semantic fusion vector of the N-1 fragment, acquiring the semantic fusion vector of the N fragment when inputting the semantic representation model into the last fragment in the disordered fragment sequence, and further predicting the ordering sequence of the N fragments in the sample text.

And step 103, inputting the semantic fusion vector of the N segments into a prediction model to generate a prediction ordering sequence of the N segments in the sample text.

In this embodiment of the present application, the prediction model may be, for example, a classification model, where the classification model obtains all the sorting orders of the N segments, predicts the probability of each sorting order, and determines the sorting order with the largest corresponding probability as the predicted sorting order of the N segments in the sample text. Wherein, the schematic diagram of the semantic representation model and the prediction model can be shown in fig. 2.

And 104, pre-training the semantic representation model and the prediction model according to the original sequencing order and the prediction sequencing order.

In the embodiment of the present application, the pre-training device of the semantic representation model may calculate the loss function value according to the original sorting order, the predicted sorting order, and a preset loss function; simultaneously adjusting parameters of the semantic representation model and the prediction model according to the loss function value; and adopting a plurality of sample texts to carry out multiple times of adjustment on parameters of the semantic representation model and the prediction model, and completing the pre-training of the semantic representation model.

In summary, an unordered fragment sequence of a sample text is obtained, and an original ordering sequence of N fragments in the unordered fragment sequence in the sample text is obtained, wherein N is a positive integer; inputting a semantic fusion vector of an ith fragment and an ith fragment in the disordered fragment sequence aiming at the ith fragment in the disordered fragment sequence, so as to obtain the semantic fusion vector of the ith fragment, wherein i is a positive integer less than or equal to N, and repeating the steps until the semantic fusion vector of the Nth fragment is obtained; inputting the semantic fusion vector of the N segments into a prediction model to generate a prediction ordering sequence of the N segments in the sample text; and pre-training the semantic representation model and the prediction model according to the original ordering sequence and the prediction ordering sequence, so that the whole sample text can be processed, partial sentences in the sample text are prevented from being selected for processing, global information of the sample text can be learned, and the processing efficiency of the semantic representation model is improved.

Fig. 3 is a schematic diagram according to a second embodiment of the present application. It should be noted that, the execution body in the embodiment of the present application is a pre-training device of a semantic representation model, and the pre-training device of the semantic representation model may specifically be a hardware device, or software in the hardware device, etc.

As shown in fig. 3, the specific implementation process of the pre-training method of the semantic representation model is as follows:

step 301, obtaining an out-of-order segment sequence of a sample text, and an original ordering order of N segments in the out-of-order segment sequence in the sample text, where N is a positive integer.

Step 302, judging whether the number of characters of the sample text exceeds the limit number of the semantic representation model; if yes, go to step 304, and if not, go to step 303.

And 303, inputting the disordered segment sequence into the semantic representation model when the number of characters does not exceed the limit number of the semantic representation model, and obtaining the semantic fusion vector of the Nth segment in the disordered segment sequence.

In the embodiment of the application, when the number of characters of the sample text does not exceed the limit number of the semantic representation model, the disordered fragment sequence can be directly input into the semantic representation model, so that the semantic representation model can process a plurality of fragments in the disordered fragment sequence in parallel to obtain the semantic fusion vector of the Nth fragment in the disordered fragment sequence, and the processing efficiency of the semantic representation model on the fragments is improved.

Step 304, when the number of characters exceeds the limit number of the semantic representation model, determining whether the ith fragment is the first fragment in the disordered fragment sequence according to the ith fragment in the disordered fragment sequence, if so, executing step 305, and if not, executing step 306.

Step 305, inputting the ith fragment into the semantic representation model, obtaining the semantic vector of the ith fragment, and determining the semantic vector of the ith fragment as the semantic fusion vector of the ith fragment.

In the embodiment of the application, if the ith fragment is the first fragment in the disordered fragment sequence, since no other fragments exist before the first fragment, the semantic vector of the ith fragment can be directly determined as the semantic fusion vector of the ith fragment, so that the fusion processing process of the vectors is reduced, and the calculated amount is reduced.

Step 306, inputting the semantic fusion vector of the i-1 th segment and the i-th segment in the disordered segment sequence into a semantic representation model to obtain the semantic fusion vector of the i-th segment, wherein i is a positive integer less than or equal to N; and then jumps to step 304.

Step 307, repeating steps 304 to 306 until the semantic fusion vector of the nth segment is obtained.

Step 308, inputting the semantic fusion vector of the nth segment into the prediction model to generate a prediction ordering order of the N segments in the sample text.

Step 309, pre-training the semantic representation model and the predictive model according to the original ordering order and the predictive ordering order.

In the embodiment of the present application, the detailed descriptions of step 301, step 308 and step 309 may refer to the embodiment shown in fig. 1, and will not be described in detail herein.

In summary, an unordered fragment sequence of a sample text is obtained, and an original ordering sequence of N fragments in the unordered fragment sequence in the sample text is obtained, wherein N is a positive integer; judging whether the number of characters of the sample text exceeds the limit number of the semantic representation model; when the number of characters does not exceed the limit number of the semantic representation models, inputting the disordered segment sequence into the semantic representation models, and obtaining semantic fusion vectors of the Nth segment in the disordered segment sequence; when the number of characters exceeds the limit number of the semantic representation model, aiming at the ith fragment in the disordered fragment sequence, and when the ith fragment is the first fragment, the semantic vector of the ith fragment is used as a semantic fusion vector; when the ith fragment is a non-first fragment, inputting the semantic fusion vector of the ith-1 fragment and the ith fragment in the disordered fragment sequence into a semantic representation model to obtain the semantic fusion vector of the ith fragment; inputting the semantic fusion vector of the N segments into a prediction model to generate a prediction ordering sequence of the N segments in the sample text; the semantic representation model and the prediction model are pre-trained according to the original ordering sequence and the prediction ordering sequence, so that the whole sample text can be processed, partial sentences in the sample text are prevented from being selected to be processed, global information of the sample text can be learned, and the processing efficiency of the semantic representation model is improved.

In order to achieve the above embodiments, the embodiments of the present application further provide a pre-training device for a semantic representation model.

Fig. 4 is a schematic diagram according to a third embodiment of the present application. As shown in fig. 4, the pre-training apparatus 400 of the semantic representation model includes: an acquisition module 410, a first input module 420, a second input module 430, and a pre-training module 440.

The obtaining module 410 is configured to obtain an disordered segment sequence of a sample text, and an original ordering order of N segments in the disordered segment sequence in the sample text, where N is a positive integer;

a first input module 420, configured to input, for an ith fragment in the disordered sequence of fragments, a semantic fusion vector of an ith-1 th fragment in the disordered sequence of fragments and the ith fragment into a semantic representation model to obtain a semantic fusion vector of the ith fragment, where i is a positive integer less than or equal to N, and repeat the steps until a semantic fusion vector of the nth fragment is obtained;

a second input module 430, configured to input the semantic fusion vector of the nth segment into a prediction model to generate a predicted ordering order of the N segments in the sample text;

a pre-training module 440, configured to pre-train the semantic representation model and the prediction model according to the original ordering order and the predicted ordering order.

As one possible implementation manner of the embodiment of the present application, the obtaining module 410 is specifically configured to obtain the sample text; dividing the sample text into N fragments; carrying out disorder treatment on the N fragments to generate a disorder fragment sequence of the sample text; and acquiring the original ordering order of N fragments in the sample text in the disordered fragment sequence.

As one possible implementation manner of the embodiments of the present application, the segment includes: at least one sentence; the obtaining module 410 is specifically configured to segment the sample text according to the sentence ending symbol in the sample text and the limited number of characters of the semantic representation model, so as to obtain N segments.

As a possible implementation manner of the embodiment of the present application, the apparatus further includes: the first judging module and the third input module; the first judging module is used for judging whether the number of characters of the sample text exceeds the limit number of the semantic representation model; the third input module is configured to input the disordered segment sequence into the semantic representation model when the number of characters does not exceed the limit number of the semantic representation model, and obtain a semantic fusion vector of an nth segment in the disordered segment sequence.

As a possible implementation manner of the embodiment of the present application, the apparatus further includes: a second judging module; the second judging module is used for judging whether the ith fragment is the first fragment in the disordered fragment sequence; the first input module is further configured to input the ith fragment into a semantic representation model when the ith fragment is a first fragment in the disordered sequence of fragments, and obtain a semantic vector of the ith fragment; and determining the semantic vector of the ith fragment as a semantic fusion vector of the ith fragment.

As a possible implementation manner of the embodiment of the present application, the manner in which the semantic representation model obtains the semantic fusion vector of the ith segment is to obtain the semantic vector of the ith segment; and carrying out fusion processing on the semantic vector of the ith fragment and the semantic fusion vector of the i-1 th fragment to obtain the semantic fusion vector of the ith fragment.

According to embodiments of the present application, there is also provided an electronic device, a readable storage medium and a computer program product.

As shown in fig. 5, a block diagram of an electronic device is provided for a pre-training method of a semantic representation model according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.

As shown in fig. 5, the electronic device includes: one or more processors 501, memory 502, and interfaces for connecting components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 501 is illustrated in fig. 5.

Memory 502 is a non-transitory computer readable storage medium provided herein. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform a pre-training method for the semantic representation model provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the pre-training method of the semantic representation model provided by the present application.

The memory 502 is used as a non-transitory computer readable storage medium for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., the acquisition module 410, the first input module 420, the second input module 430, and the pre-training module 440 shown in fig. 4) corresponding to a pre-training method of a semantic representation model in an embodiment of the present application. The processor 501 executes various functional applications of the server and data processing, i.e., a pre-training method implementing the semantic representation model in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 502.

Memory 502 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created from the use of pre-trained electronic devices of the semantic representation model, and the like. In addition, memory 502 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 502 may optionally include memory remotely located with respect to processor 501, which may be connected to the pre-trained electronic device of the semantic representation model via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the pre-training method of the semantic representation model may further include: an input device 503 and an output device 504. The processor 501, memory 502, input devices 503 and output devices 504 may be connected by a bus or otherwise, for example in fig. 5.

The input device 503 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the pre-trained electronic device of the semantic representation model, such as input devices of a touch screen, a keypad, a mouse, a track pad, a touch pad, a joystick, one or more mouse buttons, a track ball, a joystick, etc. The output devices 504 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibration motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions disclosed in the present application can be achieved, and are not limited herein.

The above embodiments do not limit the scope of the application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims

1. A pre-training method of a semantic representation model, comprising:

acquiring an disordered fragment sequence of a sample text and an original ordering sequence of N fragments in the disordered fragment sequence in the sample text, wherein N is a positive integer;

inputting a semantic fusion vector of an ith fragment and the ith fragment in the disordered fragment sequence aiming at the ith fragment in the disordered fragment sequence, inputting a semantic representation model to obtain the semantic fusion vector of the ith fragment, wherein i is a positive integer less than or equal to N, and repeating the steps until the semantic fusion vector of the Nth fragment is obtained;

inputting the semantic fusion vector of the N segments into a prediction model to generate a prediction ordering sequence of the N segments in the sample text; and

pre-training the semantic representation model and the prediction model according to the original ordering sequence and the prediction ordering sequence;

the semantic expression model obtains the semantic fusion vector of the ith fragment by obtaining the semantic vector of the ith fragment; and carrying out fusion processing on the semantic vector of the ith fragment and the semantic fusion vector of the i-1 th fragment to obtain the semantic fusion vector of the ith fragment.

2. The pre-training method of a semantic representation model according to claim 1, wherein the obtaining the unordered sequence of segments of the sample text and the original ordering order of N segments in the unordered sequence of segments in the sample text comprises:

acquiring the sample text;

dividing the sample text into N fragments;

carrying out disorder treatment on the N fragments to generate a disorder fragment sequence of the sample text; and

and acquiring the original ordering order of N fragments in the sample text in the disordered fragment sequence.

3. The pre-training method of a semantic representation model according to claim 2, wherein the segments comprise: at least one sentence;

the step of dividing the sample text into N segments includes:

and dividing the sample text into N fragments according to the sentence ending symbol in the sample text and the limited character quantity of the semantic representation model.

4. The pre-training method of a semantic representation model according to claim 1, wherein, before inputting the semantic fusion vector of the i-1 th fragment in the disordered sequence of fragments and the i-th fragment for the i-th fragment in the disordered sequence of fragments to obtain the semantic fusion vector of the i-th fragment, further comprising:

judging whether the number of characters of the sample text exceeds the limit number of the semantic representation model;

and when the number of characters does not exceed the limit number of the semantic representation model, inputting the disordered segment sequence into the semantic representation model, and obtaining a semantic fusion vector of an N-th segment in the disordered segment sequence.

5. The pre-training method of a semantic representation model according to claim 1, wherein, before inputting the semantic fusion vector of the i-1 th fragment in the disordered sequence of fragments and the i-th fragment for the i-th fragment in the disordered sequence of fragments to obtain the semantic fusion vector of the i-th fragment, further comprising:

judging whether the ith fragment is the first fragment in the disordered fragment sequence;

when the ith fragment is the first fragment in the disordered fragment sequence, inputting the ith fragment into a semantic representation model to obtain a semantic vector of the ith fragment;

and determining the semantic vector of the ith fragment as a semantic fusion vector of the ith fragment.

6. A pre-training apparatus for a semantic representation model, comprising:

the acquisition module is used for acquiring an disordered fragment sequence of the sample text and an original ordering sequence of N fragments in the disordered fragment sequence in the sample text, wherein N is a positive integer;

the first input module is used for inputting a semantic fusion vector of an ith fragment and an ith fragment in the disordered fragment sequence aiming at the ith fragment in the disordered fragment sequence, and inputting a semantic representation model to acquire the semantic fusion vector of the ith fragment, wherein i is a positive integer less than or equal to N, and the steps are repeated until the semantic fusion vector of the Nth fragment is acquired;

the second input module is used for inputting the semantic fusion vector of the N segments into a prediction model to generate a prediction ordering sequence of the N segments in the sample text;

the pre-training module is used for pre-training the semantic representation model and the prediction model according to the original ordering sequence and the prediction ordering sequence;

7. The pre-training device of a semantic representation model according to claim 6, wherein the acquisition module is specifically configured to,

acquiring the sample text;

dividing the sample text into N fragments;

8. The pre-training apparatus of a semantic representation model according to claim 7, wherein the segments comprise: at least one sentence;

the acquisition module is particularly adapted to the fact that,

9. The pre-training apparatus of a semantic representation model according to claim 6, wherein the apparatus further comprises: the first judging module and the third input module;

the first judging module is used for judging whether the number of characters of the sample text exceeds the limit number of the semantic representation model;

the third input module is configured to input the disordered segment sequence into the semantic representation model when the number of characters does not exceed the limit number of the semantic representation model, and obtain a semantic fusion vector of an nth segment in the disordered segment sequence.

10. The pre-training apparatus of a semantic representation model according to claim 6, wherein said apparatus further comprises: a second judging module;

the second judging module is used for judging whether the ith fragment is the first fragment in the disordered fragment sequence;

the first input module is further configured to input the ith fragment into a semantic representation model when the ith fragment is a first fragment in the disordered sequence of fragments, and obtain a semantic vector of the ith fragment; and determining the semantic vector of the ith fragment as a semantic fusion vector of the ith fragment.

11. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

12. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-5.