WO2021143206A1 - 单语句自然语言处理方法、装置、计算机设备及可读存储介质 - Google Patents

单语句自然语言处理方法、装置、计算机设备及可读存储介质 Download PDF

Info

Publication number
WO2021143206A1
WO2021143206A1 PCT/CN2020/118735 CN2020118735W WO2021143206A1 WO 2021143206 A1 WO2021143206 A1 WO 2021143206A1 CN 2020118735 W CN2020118735 W CN 2020118735W WO 2021143206 A1 WO2021143206 A1 WO 2021143206A1
Authority
WO
WIPO (PCT)
Prior art keywords
preset
target
encoding
single sentence
external information
Prior art date
Application number
PCT/CN2020/118735
Other languages
English (en)
French (fr)
Inventor
阮鸿涛
郑立颖
徐亮
阮晓雯
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021143206A1 publication Critical patent/WO2021143206A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/169Annotation, e.g. comment data or footnotes

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to a Bert-based single-sentence natural language processing method, device, computer equipment, and computer-readable storage medium.
  • BERT Bidirectional Encoder Representation from Transformers. It is a pre-trained language representation model that performs preliminary processing on the original natural language corpus and preliminary feature extraction, so as to generate language representations for a variety of downstream natural languages The task uses this language representation for natural language processing.
  • the input layer of the BERT pre-training language model is composed of three input layers: word embedding, position embedding, and sentence segmentation embedding.
  • the word embedding input layer represents the representation vector of words
  • the position embedding input layer represents the position information of each word in the sentence
  • the sentence segmentation embedding input layer represents the distinction between different sentences.
  • BERT combines the masked word prediction task and the next sentence prediction task by superimposing the input layer to train and obtain a pre-training model that is common on a variety of downstream tasks.
  • This application provides a Bert-based single-sentence natural language processing method, device, computer equipment, and computer-readable storage medium, which can solve the problem of low accuracy of downstream natural language task processing due to the established input method of BERT in the traditional technology .
  • this application provides a Bert-based single-sentence natural language processing method.
  • the method includes: inputting a target single sentence into a preset target Bert model, and the target Bert model is obtained by incorporating the Bert model
  • the sentence segmentation embedding input layer is constructed by replacing the preset external information coding input layer, wherein the external information coding input layer is a preset input layer for extracting the preset external information contained in the target single sentence
  • the external information is preset information in the target single sentence that has an effect on the corresponding natural language processing task
  • the target single sentence is the target single sentence that the natural language processing task responds to the target single sentence in order to obtain a speech semantic result.
  • the preset information includes word segmentation dependency and part-of-speech tagging information
  • the target single sentence is preprocessed according to the preset target Bert model to obtain the corresponding target single sentence
  • a target vector the target vector containing the corresponding external information code obtained by the target single sentence through the preset external information coding input layer, wherein the external information code is word segmentation dependency coding or part-of-speech tagging information coding
  • the target vector Inputting the target vector into a preset natural language processing model; performing speech semantic processing on the target vector according to the preset natural language processing model to obtain a speech semantic processing result corresponding to the single sentence.
  • this application also provides a Bert-based single-sentence natural language processing device, in which a preset target Bert model is adopted, and the target Bert model is obtained by dividing the sentence contained in the Bert model
  • the embedded input layer is constructed by replacing a preset external information coding input layer, wherein the external information coding input layer is a preset input layer for extracting the preset external information contained in the target single sentence, the external information It is the preset information contained in the target single sentence that has an effect on the natural language processing task corresponding to the target single sentence, and the target single sentence is the natural language processing task for obtaining speech semantic results.
  • the target object for speech and semantic processing of the target single sentence includes word segmentation dependency and part-of-speech tagging information, including: a first input unit for inputting the target single sentence into the preset target Bert model A preprocessing unit, configured to preprocess the target single sentence according to the preset target Bert model to obtain a target vector corresponding to the target single sentence, and the target vector contains the target single sentence passed The corresponding external information encoding obtained by the preset external information encoding input layer, wherein the external information encoding is word segmentation dependency encoding or part-of-speech tagging information encoding; the second input unit is used to input the target vector to the preset Suppose a natural language processing model; a processing unit, configured to perform speech semantic processing on the target vector according to the preset natural language processing model to obtain a speech semantic processing result corresponding to the single sentence.
  • the present application also provides a computer device, which includes a memory and a processor, the memory stores a computer program, and the processor executes the following steps when running the computer program: input a target single sentence A preset target Bert model, the target Bert model is constructed by replacing the sentence segmentation embedded input layer contained in the Bert model with a preset external information coding input layer, wherein the external information coding input layer is to achieve extraction
  • the input layer is preset for the preset external information contained in the target single sentence
  • the external information is the preset information in the target single sentence that acts on the corresponding natural language processing task.
  • a sentence is a target object for which the natural language processing task performs speech and semantic processing on the target single sentence in order to obtain a speech and semantic result.
  • the preset information includes word segmentation dependency and part-of-speech tagging information; according to the preset target Bert
  • the model preprocesses the target single sentence to obtain the target vector corresponding to the target single sentence, and the target vector contains the corresponding external information obtained by the target single sentence through the preset external information encoding input layer Encoding, wherein the external information encoding is word segmentation dependency encoding or part-of-speech tagging information encoding; inputting the target vector to a preset natural language processing model; performing speech on the target vector according to the preset natural language processing model Semantic processing to obtain the speech semantic processing result corresponding to the single sentence.
  • the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor realizes the following steps: A preset target Bert model for sentence input, the target Bert model is constructed by replacing the sentence segmentation embedded input layer contained in the Bert model with a preset external information coding input layer, wherein the external information coding input layer is The input layer is preset by extracting preset external information contained in the target single sentence, where the external information is preset information in the target single sentence that acts on the corresponding natural language processing task, the The target single sentence is the target object for which the natural language processing task performs speech and semantic processing on the target single sentence in order to obtain the speech and semantic results.
  • the preset information includes word segmentation dependency and part-of-speech tagging information; according to the preset The target Bert model preprocesses the target single sentence to obtain the target vector corresponding to the target single sentence, and the target vector contains the corresponding target single sentence obtained through the preset external information encoding input layer External information encoding, wherein the external information encoding is word segmentation dependency encoding or part-of-speech tagging information encoding; inputting the target vector to a preset natural language processing model; and performing processing on the target vector according to the preset natural language processing model Perform voice semantic processing to obtain the voice semantic processing result corresponding to the single sentence.
  • the preset target Bert model is constructed by replacing the sentence segmentation embedded input layer contained in the Bert model with a preset external information coding input layer, wherein the external information coding input layer is to achieve the extraction target
  • a preset input layer for preset external information contained in a single sentence where the external information is preset information contained in the target single sentence that contributes to the speech semantic processing task corresponding to the target single sentence
  • the effective external information in the target single sentence is coded by the preset external information after replacement
  • the input layer is transmitted to the downstream natural language processing model, which can effectively enhance the ability of the downstream natural language processing model to capture the target single sentence information, and can improve the accuracy and processing quality of speech semantic processing, thereby enhancing the speech semantic processing of the downstream natural language processing model Effect.
  • FIG. 1 is a schematic flowchart of a Bert-based single sentence natural language processing method provided by an embodiment of the application
  • FIG. 2 is a schematic diagram of a sub-process in the Bert-based single sentence natural language processing method provided by an embodiment of the application;
  • FIG. 3 is a schematic diagram of another sub-flow of the Bert-based natural language processing method for a single sentence according to an embodiment of the application;
  • Fig. 4 is a schematic block diagram of a Bert-based single-sentence natural language processing device provided by an embodiment of the application.
  • Fig. 5 is a schematic block diagram of a computer device provided by an embodiment of the application.
  • Single sentence also called simple sentence or short sentence, is a language unit that can express complete semantics independently, such as a word, a phrase or a sentence, especially in interactive speech, which requires natural language processing for speech recognition .
  • You will encounter more natural language processing of single sentences such as smart government, smart city management, smart community, smart security, smart logistics, smart medical, smart education, smart environmental protection, smart transportation and other scenarios included in the construction of smart cities
  • single sentences are involved. For example, in the process of handling business through smart customer service, because more will involve question and answer forms, These scenarios will all interact through a single sentence.
  • sentence text error recognition or sentence emotion classification will be involved in order to realize the interaction between intelligent machines and people to achieve the purpose of communication or business handling.
  • FIG. 1 is a schematic flowchart of a Bert-based single-sentence natural language processing method provided by an embodiment of the application. As shown in Figure 1, the method includes the following steps S101-S104:
  • a front-end voice input device for the user to provide voice semantic input, such as a microphone device or a microphone component on a smart phone, so that the user can input through voice
  • the device sends the voice
  • the microphone device or the smart phone receives the target single-sentence voice input by the user, and sends the target single-sentence voice to the backend for natural language processing, such as a back-end server, to perform natural language processing on the target single-sentence voice.
  • natural language processing such as a back-end server
  • the received original speech is generally preprocessed, such as using the Bert model to preprocess the natural language to obtain the preprocessing result, and then input the preprocessing result into the natural language processing task
  • the corresponding preset natural language processing task model performs natural language task processing.
  • the preset target Bert model is constructed by replacing the sentence segmentation embedded input layer contained in the initial Bert model with a preset external information coding input layer, wherein the external information coding input layer A preset input layer for extracting preset external information contained in a target single sentence, where the external information is included in the target single sentence and contributes to the natural language processing task corresponding to the target single sentence
  • the target single sentence is a target object for which the natural language processing task performs voice semantic processing on the target single sentence in order to obtain a voice semantic result.
  • the preset information may be the dependency relationship between word segmentation, where the preset information includes the word segmentation dependency relationship and part-of-speech tagging information, so as to realize the transformation of the input layer in the original Bert model.
  • Construct a preset target Bert model so that while retaining the two input layers of word embedding and position embedding contained in the original Bert model, the sentence segmentation embedding input layer is replaced with a preset external information encoding input layer, for example, replaced by word segmentation Dependency coding layer, or replaced with part-of-speech tagging information coding layer, to obtain the preset target Bert model.
  • the coding ids of [CLS], [SEP], and [PAD] in the external information coding input layer are all set to 0, using the parameters of the Bert pre-training model and the voice and semantic data of the downstream target natural language processing task.
  • the model is fine-tuned to obtain the target Bert model that is suitable for the speech semantic target task corresponding to the natural language processing, so as to realize that the effective preset external information can be transmitted to the downstream through the preset external information encoding input layer after replacement.
  • Task processing model For another example, in downstream tasks such as wrong sentence recognition, word segmentation, part-of-speech information, and syntactic structure play an important role.
  • the word segmentation dependency and part of speech of the sentence can be obtained through the replaced preset external information encoding input layer.
  • Annotated information can effectively enhance the ability of downstream natural language processing models to capture target single sentence information in scenarios where the amount of training data for downstream tasks is small, thereby enhancing the processing effects of downstream natural language processing models and improving the accuracy of natural language processing And processing quality.
  • the target single sentence is preprocessed through the preset target Bert model to obtain the target vector corresponding to the preprocessing result, and then the target vector is input to the preset natural
  • the language processing model performs speech semantic processing to obtain speech semantic processing results. Therefore, relative to the preset target Bert model, the natural language processing model is located downstream of the preset target Bert model, which is the downstream natural language processing model.
  • the preset target Bert model obtained by improving the original Bert model to obtain the target single sentence, and input the target single sentence into the preset target Bert model for preprocessing, thereby obtaining the target single sentence
  • the corresponding target vector since the sentence segmentation embedded input layer in the initial Bert model is replaced with a preset external information encoding input layer, at the same time, based on the Bert model's own characteristic that there are as many inputs as there are as many corresponding outputs, it is predicted It is assumed that the target vector output by the target Bert includes the external information code contained in the target single sentence obtained by the preset external information coding input layer, wherein the external information code is a word segmentation dependency code or part of speech Mark the information code.
  • the sentence segmentation embedded input layer in the original Bert model is replaced with the preset external information encoding input layer to obtain the preset target Bert model
  • the target vector output by the target Bert model contains the external information encoding contained in the single sentence, for example, the external information encoding is a word segmentation dependency encoding or part-of-speech tagging information encoding, and the target vector is input to the preset Natural language processing model, the downstream natural language processing task model of the preset target Bert model then performs natural language processing on the target vector.
  • the downstream natural language processing task model When the downstream natural language processing task model performs natural language processing, it can be fully combined with the preset external Information coding can effectively enhance the ability of the natural language processing task model to capture the target single sentence information, so as to obtain the speech semantic processing result corresponding to the target single sentence, which can improve the effect of natural language processing task model processing speech semantics, and improve the natural language processing task model.
  • the language model deals with the efficiency of natural language processing.
  • the embodiments of the present application involve single-sentence natural language processing, and in the construction of smart cities, many application scenarios involve interactive processes such as question and answer with people, and the interactive process involves more single-sentence natural language processing, so
  • the embodiments of this application can be applied to smart government affairs, smart city management, smart communities, smart security, smart logistics, smart medical care, smart education, smart environmental protection, and smart transportation scenarios, thereby promoting the construction of smart cities.
  • This embodiment of the application is constructed by inputting a target single sentence into a preset target Bert model, which is constructed by replacing the sentence segmentation embedded input layer contained in the Bert model with a preset external information encoding input layer.
  • the preset target Bert model preprocesses the target single sentence to obtain a target vector corresponding to the target single sentence, and the target vector contains the target single sentence through the preset external information encoding input layer.
  • the obtained corresponding external information code input the target vector into a preset natural language processing model, and perform speech semantic processing on the target vector according to the preset natural language processing model to obtain the speech corresponding to the single sentence Semantic processing results.
  • the preset target Bert model is constructed by replacing the sentence segmentation embedded input layer contained in the Bert model with a preset external information coding input layer, wherein the external information coding input layer is to achieve the extraction of the target single sentence
  • the input layer is preset for the preset external information contained in the target single sentence
  • the external information is the preset information included in the target single sentence that contributes to the speech semantic processing task corresponding to the target single sentence, and is aimed at
  • the natural language processing corresponding to the speech semantic task especially for the speech semantic processing task performed by the natural language processing model downstream of the target single sentence, through the effective external information in the target single sentence through the preset external information encoding input layer after replacement Transmission to the downstream natural language processing model can effectively enhance the ability of the downstream natural language processing model to capture the target single sentence information, and can improve the accuracy and processing quality of speech semantic processing, thereby enhancing the speech semantic processing effect of the downstream natural language processing model.
  • FIG. 2 is a schematic diagram of a sub-process in the Bert-based single-sentence natural language processing method provided by an embodiment of the application.
  • the step of preprocessing the target single sentence according to the preset target Bert model to obtain the target vector corresponding to the target single sentence includes: S201, adopting a first preset language The tool performs word segmentation on the target single sentence to obtain several phrases contained in the target single sentence; S202.
  • the part-of-speech tagging information includes the phrase and the part-of-speech tagging information corresponding to the phrase; S203, encoding based on all the phrases and the part-of-speech tagging information corresponding to the phrase by using a preset encoding method , In order to obtain the external information code contained in the target single sentence.
  • the first preset language tool and the second preset language tool may be language tools supporting corresponding functions such as Stanford CoreNLP or HanLP.
  • the preset coding methods include word segmentation dependency coding and part-of-speech tagging information coding.
  • language tools such as Stanford CoreNLP or HanLP
  • NLP tasks including tokenization, shallow analysis (sentence-character segmentation), word segmentation, sentence segmentation, segmentation, part-of-speech tagging, named entity recognition, and grammar parsing
  • you can The input target single sentence is segmented through the preset language tool to obtain the phrase division, and then the phrase is labeled with the part of speech, that is, the first preset language tool is used to segment the target single sentence to obtain the content of the target single sentence
  • the second preset language tool to tag each of the phrases to obtain the part-of-speech tagging information corresponding to the phrase.
  • the part-of-speech tagging information includes the phrase and the phrase corresponding to the phrase Finally, according to all the phrases and the part-of-speech tagging information corresponding to the phrases, encoding is performed through a preset encoding method to obtain the external information encoding contained in the target single sentence.
  • the target Bert model adopted in the embodiment of this application can realize the segmentation of the Bert pre-training language model embedded in the input layer based on the replacement sentence of external information coding to obtain the target Bert model.
  • the target Ber For single sentence tasks redundant sentence segmentation is embedded in the input layer, so that the effective external information in the target single sentence (such as word segmentation dependency or part-of-speech tagging information) is transmitted to the downstream naturally through the replaced preset external information encoding input layer
  • the language processing model can improve the effect of speech and semantic processing by the downstream natural language processing model.
  • Figure 3 is a schematic diagram of another sub-process of the Bert-based natural language processing method for a single sentence provided by an embodiment of the application.
  • the external information encoding is a word segmentation dependency encoding
  • the All the phrases and the part-of-speech tag information corresponding to the phrases are encoded by a preset encoding method to obtain the external information encoding contained in the target single sentence.
  • the steps include:
  • the participle dependence relationship is to use the dependence relationship between the words in the sentence to express the syntactic structure information of the words (such as the subject-predicate, verb-object, definite and other structural relationships) and use the tree structure to express the structure of the whole sentence (such as the subject Predicated object, fixed state complement, etc.).
  • Dependency Parsing reveals its syntactic structure by analyzing the dependencies between the components of a language unit. That is to analyze and identify the grammatical components of "subject, predicate, object” and "fixed adverb" in the sentence, and analyze the relationship between the components.
  • the third preset language tool can be a language tool that supports corresponding functions such as Stanford CoreNLP or HanLP. It can be the same as the first preset language tool and the second preset language tool, or it can be the same as the first preset language tool and the second preset language tool. Assuming that the language tools are different, there is no limitation here.
  • the obtained word segmentation and the part-of-speech tagging results corresponding to the word segmentation are input into the third preset language tool to perform dependency analysis through the third preset language tool to obtain the dependency relationship of the input target single sentence, and Form the dependency relationship tree information of the input sentence, the dependency relationship is that for each phrase in the sentence, there is one and only one dependent central phrase, and the two constitute a dependency relationship.
  • the dependency relationship is that for each phrase in the sentence, there is one and only one dependent central phrase, and the two constitute a dependency relationship.
  • the preset dependent coding method is a preset relative dependent position coding method or a preset absolute dependent position coding method.
  • coding the dependency tree of the input target single sentence includes the following two coding methods: 1) Relative dependency position coding: the sentence is coded based on the phrase position of the central phrase that is depended on in the dependency relationship, such as the i-th If a phrase depends on the j-th phrase, the input code id of all words in the i-th phrase is recorded as j. 2) Absolutely dependent position coding: the sentence is coded based on the position of the first word of the central phrase that is dependent in the dependency relationship in the entire sentence. If the i-th phrase depends on the j-th phrase, then the i-th phrase will be The input code id of all words is recorded as the position of the first word in the j-th phrase in the entire sentence.
  • the external information encoding is part-of-speech tagging information encoding, and based on all the phrases and the part-of-speech tagging information corresponding to the phrases, encoding is performed by a preset encoding method to obtain the target
  • the step of encoding the external information contained in a single sentence includes: encoding the part-of-speech tagging information corresponding to each phrase by using a preset BIES tagging method to obtain 4 codes corresponding to each of the part-of-speech tagging information;
  • the pieces of part-of-speech tagging information are coded to obtain 4K codes, so as to obtain the external information codes contained in the target single sentence, where K is a natural number.
  • the part-of-speech tagging information is encoded using BIES tagging, where B is the beginning word of the phrase, I is the middle word of the phrase, E is the ending word of the phrase, and S is the word with a single word as a phrase.
  • the corresponding part-of-speech tags can obtain 4K coded ids, thereby obtaining the external information codes contained in the target single sentence, where the id can be calculated from 1, and K is a natural number.
  • Bert-based single-sentence natural language processing method described in the above embodiments can recombine the technical features included in the different embodiments as needed to obtain a combined implementation plan, but they are all in this Within the scope of protection required by the application.
  • FIG. 4 is a schematic block diagram of a Bert-based single-sentence natural language processing apparatus provided by an embodiment of the present application.
  • an embodiment of the present application also provides a Bert-based single-sentence natural language processing device.
  • the Bert-based single-sentence natural language processing device includes a unit for executing the above-mentioned Bert-based single-sentence natural language processing method.
  • the Bert-based single-sentence natural language processing device can be configured in a computer In the device.
  • the Bert-based single-sentence natural language processing device 400 adopts a preset target Bert model.
  • the target Bert model is a sentence segmentation embedded input layer that is included in the Bert model and replaced with a pre-defined target Bert model. It is constructed by assuming an external information encoding input layer, wherein the external information encoding input layer is a preset input layer for extracting preset external information contained in a target single sentence, and the external information is the target single sentence
  • the preset information included in the target single sentence that acts on the natural language processing task corresponding to the target single sentence is that the natural language processing task performs speech on the target single sentence in order to obtain a speech semantic result.
  • the preset information includes word segmentation dependency and part-of-speech tagging information.
  • the Bert-based single sentence natural language processing device 400 includes a first input unit 401, a preprocessing unit 402, a second input unit 403, and processing Unit 404.
  • the first input unit 401 is configured to input the target single sentence into the preset target Bert model;
  • the preprocessing unit 402 is configured to preprocess the target single sentence according to the preset target Bert model Processing to obtain a target vector corresponding to the target single sentence, the target vector containing the corresponding external information code obtained by the target single sentence through the preset external information coding input layer, wherein the external information code For word segmentation dependency coding or part-of-speech tagging information coding;
  • the second input unit 403 is used for inputting the target vector into a preset natural language processing model;
  • the processing unit 404 is used for performing data processing according to the preset natural language processing model
  • the target vector performs speech semantic processing to obtain the speech semantic processing result corresponding to the single sentence.
  • the preprocessing unit 402 includes: a word segmentation subunit, configured to use a first preset language tool to segment the target single sentence to obtain several phrases contained in the target single sentence;
  • the tagging subunit is used to tag each phrase with a second preset language tool to obtain the part-of-speech tagging information corresponding to the phrase, and the part-of-speech tagging information includes the phrase and the phrase corresponding to it
  • coding subunit is used to encode all the phrases and the part-of-speech tagging information corresponding to the phrases by using a preset coding method to obtain the external information coding contained in the target single sentence.
  • the external information encoding is a word segmentation dependency relationship encoding
  • the encoding subunit includes: an analysis subunit for performing a dependency relationship on the phrase and the part-of-speech tagging information using a third preset language tool Analyze to obtain the dependency relationship tree; the dependency relationship encoding subunit is used to encode the dependency relationship tree using a preset dependency encoding method to obtain the external information encoding contained in the target single sentence.
  • the preset dependent coding method is a preset relative dependent position coding method or a preset absolute dependent position coding method.
  • the external information encoding is part of speech tagging information encoding
  • the encoding subunit includes:
  • the tagging sub-unit is used to encode the part-of-speech tagging information corresponding to each phrase by using a preset BIES tagging method to obtain 4 codes corresponding to each of the part-of-speech tagging information;
  • the coding subunit is used to code K pieces of part-of-speech tagging information to obtain 4K codes, thereby obtaining the external information codes contained in the target single sentence, where K is a natural number.
  • Bert-based single-sentence natural language processing device and the specific implementation process of each unit can refer to the corresponding description in the foregoing method embodiment, for the convenience of description and It's concise, so I won't repeat it here.
  • the division and connection of the units in the Bert-based single-sentence natural language processing device are only for illustration.
  • the Bert-based single-sentence natural language processing device can be divided into different units as needed. It is also possible to adopt different connection sequences and modes for the units in the Bert-based single-sentence natural language processing device to complete all or part of the functions of the above-mentioned Bert-based single-sentence natural language processing device.
  • the above-mentioned Bert-based single-sentence natural language processing apparatus can be implemented in the form of a computer program, and the computer program can be run on the computer device as shown in FIG. 5.
  • FIG. 5 is a schematic block diagram of a computer device according to an embodiment of the present application.
  • the computer device 500 may be a computer device such as a desktop computer or a server, or may be a component or component in other devices.
  • the computer device 500 includes a processor 502, a memory, and a network interface 505 connected through a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.
  • the non-volatile storage medium 503 can store an operating system 5031 and a computer program 5032.
  • the processor 502 can execute the above-mentioned Bert-based single-sentence natural language processing method.
  • the processor 502 is used to provide calculation and control capabilities to support the operation of the entire computer device 500.
  • the internal memory 504 provides an environment for the running of the computer program 5032 in the non-volatile storage medium 503.
  • the processor 502 can make the processor 502 execute the above-mentioned Bert-based single-sentence natural language processing. method.
  • the network interface 505 is used for network communication with other devices.
  • the specific computer device 500 may include more or fewer components than shown in the figure, or combine certain components, or have a different component arrangement.
  • the computer device may only include a memory and a processor. In such an embodiment, the structures and functions of the memory and the processor are consistent with the embodiment shown in FIG. 5, and will not be repeated here.
  • the processor 502 is configured to run a computer program 5032 stored in a memory to implement the Bert-based single-sentence natural language processing method described in the embodiment of the present application.
  • the processor 502 may be a central processing unit (Central Processing Unit, CPU), and the processor 502 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor.
  • the embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be a non-volatile computer-readable storage medium, or may be a volatile computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program is executed by the processor When the processor executes the steps of the Bert-based single-sentence natural language processing method described in the above embodiments.
  • the storage medium is a physical, non-transitory storage medium, such as a U disk, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a magnetic disk, or an optical disk, etc., which can store computer programs. medium.
  • a physical, non-transitory storage medium such as a U disk, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a magnetic disk, or an optical disk, etc., which can store computer programs. medium.

Abstract

一种基于Bert的单语句自然语言处理方法、装置、计算机设备及计算机可读存储介质。属于人工智能技术领域,方法包括:将目标单语句输入预设的目标Bert模型以对目标单语句进行预处理,目标Bert模型为通过将Bert模型中所包含的语句分割嵌入输入层替换为预设外部信息编码输入层而构建,以得到目标单语句所对应的目标向量,目标向量中包含通过预设外部信息编码输入层而得到的目标单语句所包含的外部信息编码,再通过预设自然语言处理模型对目标向量进行语音语义处理,以得到目标单语句所对应的语音语义处理结果。

Description

单语句自然语言处理方法、装置、计算机设备及可读存储介质
本申请要求于2020年07月16日提交中国专利局、申请号为202010688324.4、申请名称为“单语句自然语言处理方法、装置、计算机设备及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种基于Bert的单语句自然语言处理方法、装置、计算机设备及计算机可读存储介质。
背景技术
BERT的英文全称为Bidirectional Encoder Representation from Transformers,是一个预训练的语言表征模型,是对原始自然语言语料进行初步处理,初步进行特征提取,从而能够生成语言表征,以便各种各样的下游自然语言任务采用该语言表征进行自然语言处理。
BERT预训练语言模型的输入层由词嵌入、位置嵌入及语句分割嵌入三种输入层叠加而成。词嵌入输入层代表单词的表示向量,位置嵌入输入层代表语句中每个词的位置信息,语句分割嵌入输入层代表了对不同语句的区分。BERT通过叠加输入层的形式,结合遮蔽词预测任务和下一句语句预测任务,训练得到了一个在多种下游任务上通用的预训练模型。但是,发明人发现,针对单语句的下游任务,无法通过BERT既定输入方式将单语句中的一些有用信息输入至下游任务模型,降低了下游自然语言任务处理的准确性。
发明内容
本申请提供了一种基于Bert的单语句自然语言处理方法、装置、计算机设备及计算机可读存储介质,能够解决传统技术中由于BERT既定输入方式导致下游自然语言任务处理的准确性较低的问题。
第一方面,本申请提供了一种基于Bert的单语句自然语言处理方法,所述方法包括:将目标单语句输入预设的目标Bert模型,所述目标Bert模型为通过将Bert模型中所包含的语句分割嵌入输入层替换为预设外部信息编码输入层而构建,其中,所述外部信息编码输入层为实现提取所述目标单语句中所包含的预设的外部信息而预设的输入层,所述外部信息为所述目标单语句中对所对应自然语言处理任务起作用的预设信息,所述目标单语句为所述自然语言处理任务为获得语音语义结果而对所述目标单语句进行语音语义处理的目标对象,所述预设信息包括分词依存关系及词性标注信息;根据所述预设的目标Bert模型对所述目标单语句进行预处理,得到所述目标单语句所对应的目标向量,所述目标向量中包含所述目标单语句通过所述预设外部信息编码输入层而得到的对应外部信息编码,其中,所述外部信息编码为分词依存关系编码或者词性标注信息编码;将所述目标向量输入至预设自然语言处理模型;根据所述预设自然语言处理模型对所述目标向量进行语音语义处理,得到所述单语句所对应的语音语义处理结果。
第二方面,本申请还提供了一种基于Bert的单语句自然语言处理装置,所述装置中采用了预设的目标Bert模型,所述目标Bert模型为通过将Bert模型中所包含的语句分割嵌入输入层替换为预设外部信息编码输入层而构建,其中,所述外部信息编码输入层为实现提取目标单语句中所包含的预设的外部信息而预设的输入层,所述外部信息为所述目标单语句中所包含的对所述目标单语句所对应的自然语言处理任务起作用的预设信息,所述目标单语句为所述自然语言处理任务为获得语音语义结果而对所述目标单语句进行语音语义处理的目标对象,所述预设信息包括分词依存关系及词性标注信息,包括:第一输入单元,用于将所述目标单语句输入所述预设的目标Bert模型;预处理单元,用于根据所述预设的目标Bert模型对所述目标单语句进行预处理,得到所述目标单语句所对应的目标向量,所述目标向量中包含所述目标单语句通过所述预设外部信息编码输入层而得到的对应外部信息编码,其中,所述外部信息编码为分词依存关系编码或者词性标注信息编码;第二输入单元,用于将所述目标向量输入至预设自然语言处理模型;处理单元,用于根据所述预设自然语言处理模型对所述 目标向量进行语音语义处理,得到所述单语句所对应的语音语义处理结果。
第三方面,本申请还提供了一种计算机设备,其包括存储器及处理器,所述存储器上存储有计算机程序,所述处理器运行所述计算机程序时以执行如下步骤:将目标单语句输入预设的目标Bert模型,所述目标Bert模型为通过将Bert模型中所包含的语句分割嵌入输入层替换为预设外部信息编码输入层而构建,其中,所述外部信息编码输入层为实现提取所述目标单语句中所包含的预设的外部信息而预设的输入层,所述外部信息为所述目标单语句中对所对应自然语言处理任务起作用的预设信息,所述目标单语句为所述自然语言处理任务为获得语音语义结果而对所述目标单语句进行语音语义处理的目标对象,所述预设信息包括分词依存关系及词性标注信息;根据所述预设的目标Bert模型对所述目标单语句进行预处理,得到所述目标单语句所对应的目标向量,所述目标向量中包含所述目标单语句通过所述预设外部信息编码输入层而得到的对应外部信息编码,其中,所述外部信息编码为分词依存关系编码或者词性标注信息编码;将所述目标向量输入至预设自然语言处理模型;根据所述预设自然语言处理模型对所述目标向量进行语音语义处理,得到所述单语句所对应的语音语义处理结果。
第四方面,本申请还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时使所述处理器实现如下步骤:将目标单语句输入预设的目标Bert模型,所述目标Bert模型为通过将Bert模型中所包含的语句分割嵌入输入层替换为预设外部信息编码输入层而构建,其中,所述外部信息编码输入层为实现提取所述目标单语句中所包含的预设的外部信息而预设的输入层,所述外部信息为所述目标单语句中对所对应自然语言处理任务起作用的预设信息,所述目标单语句为所述自然语言处理任务为获得语音语义结果而对所述目标单语句进行语音语义处理的目标对象,所述预设信息包括分词依存关系及词性标注信息;根据所述预设的目标Bert模型对所述目标单语句进行预处理,得到所述目标单语句所对应的目标向量,所述目标向量中包含所述目标单语句通过所述预设外部信息编码输入层而得到的对应外部信息编码,其中,所述外部信息编码为分词依存关系编码或者词性标注信息编码;将所述目标向量输入至预设自然语言处理模型;根据所述预设自然语言处理模型对所述目标向量进行语音语义处理,得到所述单语句所对应的语音语义处理结果。
本申请由于所述预设的目标Bert模型为通过将Bert模型中所包含的语句分割嵌入输入层替换为预设外部信息编码输入层而构建,其中,所述外部信息编码输入层为实现提取目标单语句中所包含的预设外部信息而预设的输入层,所述外部信息为所述目标单语句中所包含的对所述目标单语句所对应的语音语义处理任务起作用的预设信息,针对语音语义任务所对应的自然语言处理,尤其针对目标单语句下游的自然语言处理模型所进行的语音语义处理任务,通过将目标单语句中有效的外部信息通过替换后的预设外部信息编码输入层传输至下游自然语言处理模型,可以有效增强下游自然语言处理模型抓取目标单语句信息的能力,能够提升语音语义处理的准确性和处理质量,从而增强下游自然语言处理模型的语音语义处理效果。
附图说明
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的基于Bert的单语句自然语言处理方法的一个流程示意图;
图2为本申请实施例提供的基于Bert的单语句自然语言处理方法中一个子流程的示意图;
图3为本申请实施例提供的基于Bert的单语句自然语言处理方法的另一个子流程示意图;
图4为本申请实施例提供的基于Bert的单语句自然语言处理装置的一个示意性框图;以及
图5为本申请实施例提供的计算机设备的示意性框图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
单语句,又可以称为简单语句或者简短语句,为能够独立表达完整语义的语言单元,例如为一个词、一个短语或者一句话,尤其在交互式语音中,需要进行语音识别的自然语言处理中,会更多的遇到对单语句的自然语言处理,比如智慧城市建设中包含的智慧政务、智慧城管、智慧社区、智慧安防、智慧物流、智慧医疗、智慧教育、智慧环保及智慧交通等场景中,需要采用智能机器设备与人进行交互,用户与智能机器设备进行交互时均会涉及到单语句,再比如,在通过智能客服办理业务的过程中,由于更多的会涉及到问答形式,这些场景均会通过单语句进行交互,在对自然语言处理过程中,会涉及到语句文本错误识别或者语句情感分类,以实现智能机器设备与人进行交互以达到沟通或者办理业务的目的。
请参阅图1,图1为本申请实施例提供的基于Bert的单语句自然语言处理方法的一个流程示意图。如图1所示,该方法包括以下步骤S101-S104:
S101、将目标单语句输入预设的目标Bert模型,所述目标Bert模型为通过将Bert模型中所包含的语句分割嵌入输入层替换为预设外部信息编码输入层而构建,其中,所述外部信息编码输入层为实现提取所述目标单语句中所包含的预设的外部信息而预设的输入层,所述外部信息为所述目标单语句中对所对应自然语言处理任务起作用的预设信息,所述目标单语句为所述自然语言处理任务为获得语音语义结果而对所述目标单语句进行语音语义处理的目标对象,所述预设信息包括分词依存关系及词性标注信息。
S102、根据所述预设的目标Bert模型对所述目标单语句进行预处理,得到所述目标单语句所对应的目标向量,所述目标向量中包含所述目标单语句通过所述预设外部信息编码输入层而得到的对应外部信息编码,其中,所述外部信息编码为分词依存关系编码或者词性标注信息编码。
具体地,在使用自然语言处理以进行语音语义任务的场景中,一般会存在前端供用户提供语音语义输入的语音输入设备,比如麦克风设备或者智能手机上的麦克风组件等,从而用户可以通过语音输入设备发送语音,麦克风设备或者智能手机接收用户输入的目标单语句语音,并将目标单语句语音发送至进行自然语言处理的后端,比如后台服务器等,以对目标单语句语音进行自然语言处理,以了解用户发送的语音的意图,并采取对应的预设应答。在对接收的语音进行自然语言处理时,一般会对接收的原始语音进行预处理,比如使用Bert模型对自然语言进行预处理以得到预处理结果,然后将预处理结果再输入该自然语言处理任务所对应的预设自然语言处理任务模型进行自然语言任务处理。在使用Bert模型对语句进行预训练时,由于针对目标单语句的下游任务,例如下游任务为语句文本错误识别或者语句情感分类等语音语义处理任务时,同一输入语句没有不同语句间的区分,因此,原始Bert模型中所包含的语句分割嵌入输入层成为了冗余的输入层。而同时,针对具体的下游任务,例如语句文本错误识别或者语句情感分类等语音语义处理任务,目标单语句中对下游任务处理有用的信息无法仅通过原始Bert模型中所包含的词嵌入和位置嵌入传输给下游的自然语言处理任务模型,语句分割嵌入输入层也无法起到输入额外信息的作用。例如在语句文本错误识别任务中,分词间的依存关系信息对识别任务存在帮助作用,但无法通过BERT中既定的输入方式将依存关系信息传输给下游任务处理模型,降低了下游自然语言任务处理的准确性。
因此,在本申请实施例中,通过将初始Bert模型中所包含的语句分割嵌入输入层替换为预设外部信息编码输入层而构建预设的目标Bert模型,其中,所述外部信息编码输入层为实现提取目标单语句中所包含的预设外部信息而预设的输入层,所述外部信息为所述目标单语句中所包含的对所述目标单语句所对应的自然语言处理任务起作用的预设信息,所述目标单语句为所述自然语言处理任务为获得语音语义结果而对所述目标单语句进行语音语义处理的 目标对象。例如在语句文本错误识别任务中,预设信息可以为分词间的依存关系,其中,所述预设信息包括分词依存关系及词性标注信息,从而实现改造原始Bert模型中的输入层。构建预设的目标Bert模型,从而在保留原始Bert模型中所包含的词嵌入及位置嵌入两个输入层的同时,将语句分割嵌入输入层替换为预设外部信息编码输入层,例如替换为分词依存关系编码层,或者替换为词性标注信息编码层,以得到预设的目标Bert模型。同时将该外部信息编码输入层中[CLS]、[SEP]、[PAD]的编码id都设为0,利用Bert预训练模型的参数,以及下游的目标自然语言处理任务的语音语义数据对Bert模型进行微调,得到适用于进行自然语言处理所对应的语音语义目标任务所对应的目标Bert模型,从而实现能够将有效的预设外部信息通过替换后的预设外部信息编码输入层传输给下游的任务处理模型。再例如,在错句识别等下游任务中,分词、词性信息和句法结构有着重要作用,通过改进后的目标Bert模型可以通过替换后的预设外部信息编码输入层获得语句的分词依存关系及词性标注信息,在下游任务训练数据量较少的场景下,可以有效增强下游自然语言处理模型抓取目标单语句信息的能力,从而增强下游自然语言处理模型的处理效果,提升自然语言处理的准确性和处理质量。
需要说明的是,在本申请实施例中,由于先通过预设目标Bert模型对所述目标单语句进行预处理,以得到预处理结果所对应的目标向量,再将目标向量输入至预设自然语言处理模型以进行语音语义处理,从而得到语音语义处理结果,因此,相对于预设目标Bert模型,自然语言处理模型位于预设目标Bert模型的下游,为下游的自然语言处理模型。
使用针对原始Bert模型进行改进所得到的预设的目标Bert模型,获取目标单语句,并将所述目标单语句输入所述预设的目标Bert模型进行预处理,从而得到所述目标单语句所对应的目标向量,由于将初始Bert模型中的语句分割嵌入输入层替换为预设外部信息编码输入层,同时,基于Bert模型本身具有的有多少个输入就有多少个对应的输出的特性,预设目标Bert输出的所述目标向量中包含通过所述预设外部信息编码输入层而得到的所述目标单语句所包含的外部信息编码,其中,所述外部信息编码为分词依存关系编码或者词性标注信息编码。
S103、将所述目标向量输入至预设自然语言处理模型。
S104、根据所述预设自然语言处理模型对所述目标向量进行语音语义处理,得到所述单语句所对应的语音语义处理结果。
具体地,利用改进后的预设目标Bert模型对目标单语句进行预处理后,由于将原始Bert模型中的语句分割嵌入输入层替换为预设外部信息编码输入层以得到预设的目标Bert模型,所述目标Bert模型输出的目标向量中就包含了单语句所包含的外部信息编码,例如所述外部信息编码为分词依存关系编码或者词性标注信息编码,并将所述目标向量输入至预设自然语言处理模型,预设的目标Bert模型的下游自然语言处理任务模型再对所述目标向量进行自然语言处理,下游的自然语言处理任务模型进行自然语言处理时,就可以充分的结合预设外部信息编码以有效增强自然语言处理任务模型抓取目标单语句信息的能力,以得到所述目标单语句所对应的语音语义处理结果,能够提高自然语言处理任务模型处理语音语义的效果,提高了自然语言模型处理自然语言处理的效率。
进一步地,由于本申请实施例涉及单语句自然语言处理,而在智慧城市的建设中,很多应用场景涉及与人进行问答等交互过程,而交互过程中涉及较多的单语句自然语言处理,因此,本申请实施例可应用于智慧政务、智慧城管、智慧社区、智慧安防、智慧物流、智慧医疗、智慧教育、智慧环保及智慧交通场景中,从而推动智慧城市的建设。
本申请实施例通过将目标单语句输入预设的目标Bert模型,所述目标Bert模型为通过将Bert模型中所包含的语句分割嵌入输入层替换为预设外部信息编码输入层而构建,根据所述预设的目标Bert模型对所述目标单语句进行预处理,得到所述目标单语句所对应的目标向量,所述目标向量中包含所述目标单语句通过所述预设外部信息编码输入层而得到的对应外部信息编码,将所述目标向量输入至预设自然语言处理模型,根据所述预设自然语言处理模型对所述目标向量进行语音语义处理,得到所述单语句所对应的语音语义处理结果。由于所述预设的目标Bert模型为通过将Bert模型中所包含的语句分割嵌入输入层替换为预设外部信息编 码输入层而构建,其中,所述外部信息编码输入层为实现提取目标单语句中所包含的预设外部信息而预设的输入层,所述外部信息为所述目标单语句中所包含的对所述目标单语句所对应的语音语义处理任务起作用的预设信息,针对语音语义任务所对应的自然语言处理,尤其针对目标单语句下游的自然语言处理模型所进行的语音语义处理任务,通过将目标单语句中有效的外部信息通过替换后的预设外部信息编码输入层传输至下游自然语言处理模型,可以有效增强下游自然语言处理模型抓取目标单语句信息的能力,能够提升语音语义处理的准确性和处理质量,从而增强下游自然语言处理模型的语音语义处理效果。
请参阅图2,图2为本申请实施例提供的基于Bert的单语句自然语言处理方法中一个子流程的示意图。在该实施例中,所述根据所述预设的目标Bert模型对所述目标单语句进行预处理,得到所述目标单语句所对应的目标向量的步骤包括:S201、采用第一预设语言工具对所述目标单语句进行分词,以得到所述目标单语句所包含的若干个短语;S202、采用第二预设语言工具对每个所述短语进行词性标注,以得到所述短语所对应的词性标注信息,所述词性标注信息包括所述短语及所述短语所对应的词性;S203、基于所有所述短语及所述短语所对应的所述词性标注信息,通过预设编码方式进行编码,以得到所述目标单语句所包含的外部信息编码。
其中,第一预设语言工具及第二预设语言工具可以为Stanford CoreNLP或者HanLP等支持对应功能的语言工具。
预设编码方式包括分词依存关系编码及词性标注信息编码。
具体地,由于语言工具(例如Stanford CoreNLP或者HanLP)支持包括标记化、浅层分析(句字分块)、分词、分句、分块、词性标注、命名实体识别及语法解析等NLP任务,可以通过预设语言工具对输入的目标单语句进行分词得到短语划分,再对短语进行词性标注,即采用第一预设语言工具对所述目标单语句进行分词,以得到所述目标单语句所包含的若干个短语,进而采用第二预设语言工具对每个所述短语进行词性标注,以得到所述短语所对应的词性标注信息,所述词性标注信息包括所述短语及所述短语所对应的词性,最后根据所有所述短语及所述短语所对应的所述词性标注信息,通过预设编码方式进行编码,以得到所述目标单语句所包含的外部信息编码。
本申请实施例中所采用的目标Bert模型,可以实现基于外部信息编码替换语句分割嵌入输入层的Bert预训练语言模型以得到目标Bert模型,与原始Bert模型相比,本申请实施例的目标Ber对于单语句任务时的冗余的语句分割嵌入输入层,从而将目标单语句中的有效外部信息(例如分词依存关系或者词性标注信息)通过替换后的预设外部信息编码输入层传输至下游自然语言处理模型,能够提高下游自然语言处理模型进行语音语义处理的效果。
请参阅图3,图3为本申请实施例提供的基于Bert的单语句自然语言处理方法的另一个子流程示意图,在该实施例中,所述外部信息编码为分词依存关系编码,所述基于所有所述短语及所述短语所对应的所述词性标注信息,通过预设编码方式进行编码,以得到所述目标单语句所包含的外部信息编码的步骤包括:
S301、采用第三预设语言工具对所述短语及所述词性标注信息进行依存关系分析,以得到依存关系树;S302、采用预设依存编码方式对所述依存关系树进行编码,以得到所述目标单语句所包含的外部信息编码。
其中,分词依存关系为利用句子中词与词之间的依存关系来表示词语的句法结构信息(如主谓、动宾、定中等结构关系)并用树状结构来表示整句的结构(如主谓宾、定状补等)。依存语法(Dependency Parsing,DP)通过分析语言单位内成分之间的依存关系揭示其句法结构。即分析识别句子中的“主谓宾”、“定状补”这些语法成分,并分析各成分之间的关系。
第三预设语言工具可以为Stanford CoreNLP或者HanLP等支持对应功能的语言工具,可以与第一预设语言工具及第二预设语言工具相同,也可以与第一预设语言工具及第二预设语言工具不相同,在此不做限定。
具体地,在经过采用第一预设语言工具对所述目标单语句进行分词,以得到所述目标单 语句所包含的若干个短语,及采用第二预设语言工具对每个所述短语进行词性标注后,将得到的若干分词及所述分词对应的词性标注结果输入第三预设语言工具,以通过第三预设语言工具执行依存关系分析,得到输入的目标单语句的依存关系,以形成输入语句的依存关系树信息,所述依存关系即对于语句中的每个短语,都有且仅有一个依赖的中心短语,两者构成依存关系,其中,对于依存关系树的根节点,设其依赖的中心短语为root,对应位置为0。
进一步地,所述预设依存编码方式为预设相对依存位置编码方式或者为预设绝对依存位置编码方式。
具体地,对输入的目标单语句的依存关系树进行编码,包括以下两种编码方式:1)相对依存位置编码:以依存关系中被依赖的中心短语的短语位置对语句进行编码,如第i个短语依赖第j个短语,则将第i个短语中所有字的输入编码id记为j。2)绝对依存位置编码:以依存关系中被依赖的中心短语的第一个字在整个语句中的位置对语句进行编码,如第i个短语依赖第j个短语,则将第i个短语中所有字的输入编码id记为第j个短语中第一个字在整个语句中的位置。
在一实施例中,所述外部信息编码为词性标注信息编码,所述基于所有所述短语及所述短语所对应的所述词性标注信息,通过预设编码方式进行编码,以得到所述目标单语句所包含的外部信息编码的步骤包括:采用预设BIES标注方式对每个所述短语所对应的词性标注信息进行编码以得到每个所述词性标注信息所对应的4个编码;对K个词性标注信息进行编码以得到4K个编码,从而得到所述目标单语句所包含的外部信息编码,其中,K为自然数。
具体地,对词性标注信息进行编码采用BIES标注,其中B为短语开头词,I为短语中间词,E为短语结尾词,S为单字作为短语的词。对每个字标注编码id,采用预设BIES标注方式对每个所述短语所对应的词性标注信息进行编码以得到每个所述词性标注信息所对应的4个编码,对K个词性标注信息所对应的词性标签可以得到4K个编码id,从而得到所述目标单语句所包含的外部信息编码,,其中,id可以从1开始计算,K为自然数。
需要说明的是,上述各个实施例所述的基于Bert的单语句自然语言处理方法,可以根据需要将不同实施例中包含的技术特征重新进行组合,以获取组合后的实施方案,但都在本申请要求的保护范围之内。
请参阅图4,图4为本申请实施例提供的基于Bert的单语句自然语言处理装置的一个示意性框图。对应于上述所述基于Bert的单语句自然语言处理方法,本申请实施例还提供一种基于Bert的单语句自然语言处理装置。如图4所示,该基于Bert的单语句自然语言处理装置包括用于执行上述所述基于Bert的单语句自然语言处理方法的单元,该基于Bert的单语句自然语言处理装置可以被配置于计算机设备中。具体地,请参阅图4,基于Bert的单语句自然语言处理装置400中采用了预设的目标Bert模型,所述目标Bert模型为通过将Bert模型中所包含的语句分割嵌入输入层替换为预设外部信息编码输入层而构建,其中,所述外部信息编码输入层为实现提取目标单语句中所包含的预设的外部信息而预设的输入层,所述外部信息为所述目标单语句中所包含的对所述目标单语句所对应的自然语言处理任务起作用的预设信息,所述目标单语句为所述自然语言处理任务为获得语音语义结果而对所述目标单语句进行语音语义处理的目标对象,所述预设信息包括分词依存关系及词性标注信息,该基于Bert的单语句自然语言处理装置400包括第一输入单元401、预处理单元402、第二输入单元403及处理单元404。
其中,第一输入单元401,用于将所述目标单语句输入所述预设的目标Bert模型;预处理单元402,用于根据所述预设的目标Bert模型对所述目标单语句进行预处理,得到所述目标单语句所对应的目标向量,所述目标向量中包含所述目标单语句通过所述预设外部信息编码输入层而得到的对应外部信息编码,其中,所述外部信息编码为分词依存关系编码或者词性标注信息编码;第二输入单元403,用于将所述目标向量输入至预设自然语言处理模型;处理单元404,用于根据所述预设自然语言处理模型对所述目标向量进行语音语义处理,得到所述单语句所对应的语音语义处理结果。
在一实施例中,所述预处理单元402包括:分词子单元,用于采用第一预设语言工具对所述目标单语句进行分词,以得到所述目标单语句所包含的若干个短语;标注子单元,用于采用第二预设语言工具对每个所述短语进行词性标注,以得到所述短语所对应的词性标注信息,所述词性标注信息包括所述短语及所述短语所对应的词性;编码子单元,用于基于所有所述短语及所述短语所对应的所述词性标注信息,通过预设编码方式进行编码,以得到所述目标单语句所包含的外部信息编码。
在一实施例中,所述外部信息编码为分词依存关系编码,所述编码子单元包括:分析子单元,用于采用第三预设语言工具对所述短语及所述词性标注信息进行依存关系分析,以得到依存关系树;依存关系编码子单元,用于采用预设依存编码方式对所述依存关系树进行编码,以得到所述目标单语句所包含的外部信息编码。
在一实施例中,所述预设依存编码方式为预设相对依存位置编码方式或者为预设绝对依存位置编码方式。
在一实施例中,所述外部信息编码为词性标注信息编码,所述编码子单元包括:
标注次子单元,用于采用预设BIES标注方式对每个所述短语所对应的词性标注信息进行编码以得到每个所述词性标注信息所对应的4个编码;
编码次子单元,用于对K个词性标注信息进行编码以得到4K个编码,从而得到所述目标单语句所包含的外部信息编码,其中,K为自然数。
需要说明的是,所属领域的技术人员可以清楚地了解到,上述基于Bert的单语句自然语言处理装置和各单元的具体实现过程,可以参考前述方法实施例中的相应描述,为了描述的方便和简洁,在此不再赘述。
同时,上述基于Bert的单语句自然语言处理装置中各个单元的划分和连接方式仅用于举例说明,在其他实施例中,可将基于Bert的单语句自然语言处理装置按照需要划分为不同的单元,也可将基于Bert的单语句自然语言处理装置中各单元采取不同的连接顺序和方式,以完成上述基于Bert的单语句自然语言处理装置的全部或部分功能。
上述基于Bert的单语句自然语言处理装置可以实现为一种计算机程序的形式,该计算机程序可以在如图5所示的计算机设备上运行。
请参阅图5,图5是本申请实施例提供的一种计算机设备的示意性框图。该计算机设备500可以是台式机电脑或者服务器等计算机设备,也可以是其他设备中的组件或者部件。
参阅图5,该计算机设备500包括通过系统总线501连接的处理器502、存储器和网络接口505,其中,存储器可以包括非易失性存储介质503和内存储器504。
该非易失性存储介质503可存储操作系统5031和计算机程序5032。该计算机程序5032被执行时,可使得处理器502执行一种上述基于Bert的单语句自然语言处理方法。
该处理器502用于提供计算和控制能力,以支撑整个计算机设备500的运行。
该内存储器504为非易失性存储介质503中的计算机程序5032的运行提供环境,该计算机程序5032被处理器502执行时,可使得处理器502执行一种上述基于Bert的单语句自然语言处理方法。
该网络接口505用于与其它设备进行网络通信。本领域技术人员可以理解,图5中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备500的限定,具体的计算机设备500可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。例如,在一些实施例中,计算机设备可以仅包括存储器及处理器,在这样的实施例中,存储器及处理器的结构及功能与图5所示实施例一致,在此不再赘述。
其中,所述处理器502用于运行存储在存储器中的计算机程序5032,以实现本申请实施例所描述的基于Bert的单语句自然语言处理方法。
应当理解,在本申请实施例中,处理器502可以是中央处理单元(Central Processing Unit,CPU),该处理器502还可以是其他通用处理器、数字信号处理器(Digital Signal Processor, DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
本领域普通技术人员可以理解的是实现上述实施例的方法中的全部或部分流程,是可以通过计算机程序来完成,该计算机程序可存储于一计算机可读存储介质。该计算机程序被该计算机系统中的至少一个处理器执行,以实现上述方法的实施例的步骤。
因此,本申请实施例还提供一种计算机可读存储介质。该计算机可读存储介质可以为非易失性的计算机可读存储介质,也可以为易失性的计算机可读存储介质,该计算机可读存储介质存储有计算机程序,该计算机程序被处理器执行时使处理器执行以上各实施例中所描述的所述基于Bert的单语句自然语言处理方法的步骤。
所述存储介质为实体的、非瞬时性的存储介质,例如可以是U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、磁碟或者光盘等各种可以存储计算机程序的实体存储介质。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
以上所述,仅为本申请的具体实施方式,但本申请明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (20)

  1. 一种基于Bert的单语句自然语言处理方法,包括:
    将目标单语句输入预设的目标Bert模型,所述目标Bert模型为通过将Bert模型中所包含的语句分割嵌入输入层替换为预设外部信息编码输入层而构建,其中,所述外部信息编码输入层为实现提取所述目标单语句中所包含的预设的外部信息而预设的输入层,所述外部信息为所述目标单语句中对所对应自然语言处理任务起作用的预设信息,所述目标单语句为所述自然语言处理任务为获得语音语义结果而对所述目标单语句进行语音语义处理的目标对象,所述预设信息包括分词依存关系及词性标注信息;
    根据所述预设的目标Bert模型对所述目标单语句进行预处理,得到所述目标单语句所对应的目标向量,所述目标向量中包含所述目标单语句通过所述预设外部信息编码输入层而得到的对应外部信息编码,其中,所述外部信息编码为分词依存关系编码或者词性标注信息编码;
    将所述目标向量输入至预设自然语言处理模型;
    根据所述预设自然语言处理模型对所述目标向量进行语音语义处理,得到所述单语句所对应的语音语义处理结果。
  2. 根据权利要求1所述基于Bert的单语句自然语言处理方法,其中,所述根据所述预设的目标Bert模型对所述目标单语句进行预处理,得到所述目标单语句所对应的目标向量的步骤包括:
    采用第一预设语言工具对所述目标单语句进行分词,以得到所述目标单语句所包含的若干个短语;
    采用第二预设语言工具对每个所述短语进行词性标注,以得到所述短语所对应的词性标注信息,所述词性标注信息包括所述短语及所述短语所对应的词性;
    基于所有所述短语及所述短语所对应的所述词性标注信息,通过预设编码方式进行编码,以得到所述目标单语句所包含的外部信息编码。
  3. 根据权利要求2所述基于Bert的单语句自然语言处理方法,其中,所述外部信息编码为分词依存关系编码,所述基于所有所述短语及所述短语所对应的所述词性标注信息,通过预设编码方式进行编码,以得到所述目标单语句所包含的外部信息编码的步骤包括:
    采用第三预设语言工具对所述短语及所述词性标注信息进行依存关系分析,以得到依存关系树;
    采用预设依存编码方式对所述依存关系树进行编码,以得到所述目标单语句所包含的外部信息编码。
  4. 根据权利要求3所述基于Bert的单语句自然语言处理方法,其中,所述预设依存编码方式为预设相对依存位置编码方式或者为预设绝对依存位置编码方式。
  5. 根据权利要求2所述基于Bert的单语句自然语言处理方法,其中,所述外部信息编码为词性标注信息编码,所述基于所有所述短语及所述短语所对应的所述词性标注信息,通过预设编码方式进行编码,以得到所述目标单语句所包含的外部信息编码的步骤包括:
    采用预设BIES标注方式对每个所述短语所对应的词性标注信息进行编码以得到每个所述词性标注信息所对应的4个编码;
    对K个词性标注信息进行编码以得到4K个编码,从而得到所述目标单语句所包含的外部信息编码,其中,K为自然数。
  6. 一种基于Bert的单语句自然语言处理装置,所述装置中采用了预设的目标Bert模型,所述目标Bert模型为通过将Bert模型中所包含的语句分割嵌入输入层替换为预设外部信息编码输入层而构建,其中,所述外部信息编码输入层为实现提取目标单语句中所包含的预设的外部信息而预设的输入层,所述外部信息为所述目标单语句中所包含的对所述目标单语句所对应的自然语言处理任务起作用的预设信息,所述目标单语句为所述自然语言处理任务为获得语音语义结果而对所述目标单语句进行语音语义处理的目标对象,所述预设信息包括分词 依存关系及词性标注信息,包括:
    第一输入单元,用于将所述目标单语句输入所述预设的目标Bert模型;
    预处理单元,用于根据所述预设的目标Bert模型对所述目标单语句进行预处理,得到所述目标单语句所对应的目标向量,所述目标向量中包含所述目标单语句通过所述预设外部信息编码输入层而得到的对应外部信息编码,其中,所述外部信息编码为分词依存关系编码或者词性标注信息编码;
    第二输入单元,用于将所述目标向量输入至预设自然语言处理模型;
    处理单元,用于根据所述预设自然语言处理模型对所述目标向量进行语音语义处理,得到所述单语句所对应的语音语义处理结果。
  7. 根据权利要求6所述基于Bert的单语句自然语言处理装置,其中,所述预处理单元包括:
    分词子单元,用于采用第一预设语言工具对所述目标单语句进行分词,以得到所述目标单语句所包含的若干个短语;
    标注子单元,用于采用第二预设语言工具对每个所述短语进行词性标注,以得到所述短语所对应的词性标注信息,所述词性标注信息包括所述短语及所述短语所对应的词性;
    编码子单元,用于基于所有所述短语及所述短语所对应的所述词性标注信息,通过预设编码方式进行编码,以得到所述目标单语句所包含的外部信息编码。
  8. 根据权利要求7所述基于Bert的单语句自然语言处理装置,其中,所述外部信息编码为分词依存关系编码,所述编码子单元包括:
    分析子单元,用于采用第三预设语言工具对所述短语及所述词性标注信息进行依存关系分析,以得到依存关系树;
    依存关系编码子单元,用于采用预设依存编码方式对所述依存关系树进行编码,以得到所述目标单语句所包含的外部信息编码。
  9. 根据权利要求8所述基于Bert的单语句自然语言处理装置,其中,所述预设依存编码方式为预设相对依存位置编码方式或者为预设绝对依存位置编码方式。
  10. 根据权利要求7所述基于Bert的单语句自然语言处理装置,其中,所述外部信息编码为词性标注信息编码,所述编码子单元包括:
    标注次子单元,用于采用预设BIES标注方式对每个所述短语所对应的词性标注信息进行编码以得到每个所述词性标注信息所对应的4个编码;
    编码次子单元,用于对K个词性标注信息进行编码以得到4K个编码,从而得到所述目标单语句所包含的外部信息编码,其中,K为自然数。
  11. 一种计算机设备,所述计算机设备包括存储器以及与所述存储器相连的处理器;所述存储器用于存储计算机程序;所述处理器用于运行所述计算机程序,以执行如下步骤:
    将目标单语句输入预设的目标Bert模型,所述目标Bert模型为通过将Bert模型中所包含的语句分割嵌入输入层替换为预设外部信息编码输入层而构建,其中,所述外部信息编码输入层为实现提取所述目标单语句中所包含的预设的外部信息而预设的输入层,所述外部信息为所述目标单语句中对所对应自然语言处理任务起作用的预设信息,所述目标单语句为所述自然语言处理任务为获得语音语义结果而对所述目标单语句进行语音语义处理的目标对象,所述预设信息包括分词依存关系及词性标注信息;
    根据所述预设的目标Bert模型对所述目标单语句进行预处理,得到所述目标单语句所对应的目标向量,所述目标向量中包含所述目标单语句通过所述预设外部信息编码输入层而得到的对应外部信息编码,其中,所述外部信息编码为分词依存关系编码或者词性标注信息编码;
    将所述目标向量输入至预设自然语言处理模型;
    根据所述预设自然语言处理模型对所述目标向量进行语音语义处理,得到所述单语句所对应的语音语义处理结果。
  12. 根据权利要求11所述计算机设备,其中,所述根据所述预设的目标Bert模型对所述目标单语句进行预处理,得到所述目标单语句所对应的目标向量的步骤包括:
    采用第一预设语言工具对所述目标单语句进行分词,以得到所述目标单语句所包含的若干个短语;
    采用第二预设语言工具对每个所述短语进行词性标注,以得到所述短语所对应的词性标注信息,所述词性标注信息包括所述短语及所述短语所对应的词性;
    基于所有所述短语及所述短语所对应的所述词性标注信息,通过预设编码方式进行编码,以得到所述目标单语句所包含的外部信息编码。
  13. 根据权利要求12所述计算机设备,其中,所述外部信息编码为分词依存关系编码,所述基于所有所述短语及所述短语所对应的所述词性标注信息,通过预设编码方式进行编码,以得到所述目标单语句所包含的外部信息编码的步骤包括:
    采用第三预设语言工具对所述短语及所述词性标注信息进行依存关系分析,以得到依存关系树;
    采用预设依存编码方式对所述依存关系树进行编码,以得到所述目标单语句所包含的外部信息编码。
  14. 根据权利要求13所述计算机设备,其中,所述预设依存编码方式为预设相对依存位置编码方式或者为预设绝对依存位置编码方式。
  15. 根据权利要求12所述计算机设备,其中,所述外部信息编码为词性标注信息编码,所述基于所有所述短语及所述短语所对应的所述词性标注信息,通过预设编码方式进行编码,以得到所述目标单语句所包含的外部信息编码的步骤包括:
    采用预设BIES标注方式对每个所述短语所对应的词性标注信息进行编码以得到每个所述词性标注信息所对应的4个编码;
    对K个词性标注信息进行编码以得到4K个编码,从而得到所述目标单语句所包含的外部信息编码,其中,K为自然数。
  16. 一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序被处理器执行时以实现如下步骤:
    将目标单语句输入预设的目标Bert模型,所述目标Bert模型为通过将Bert模型中所包含的语句分割嵌入输入层替换为预设外部信息编码输入层而构建,其中,所述外部信息编码输入层为实现提取所述目标单语句中所包含的预设的外部信息而预设的输入层,所述外部信息为所述目标单语句中对所对应自然语言处理任务起作用的预设信息,所述目标单语句为所述自然语言处理任务为获得语音语义结果而对所述目标单语句进行语音语义处理的目标对象,所述预设信息包括分词依存关系及词性标注信息;
    根据所述预设的目标Bert模型对所述目标单语句进行预处理,得到所述目标单语句所对应的目标向量,所述目标向量中包含所述目标单语句通过所述预设外部信息编码输入层而得到的对应外部信息编码,其中,所述外部信息编码为分词依存关系编码或者词性标注信息编码;
    将所述目标向量输入至预设自然语言处理模型;
    根据所述预设自然语言处理模型对所述目标向量进行语音语义处理,得到所述单语句所对应的语音语义处理结果。
  17. 根据权利要求16所述计算机可读存储介质,其中,所述根据所述预设的目标Bert模型对所述目标单语句进行预处理,得到所述目标单语句所对应的目标向量的步骤包括:
    采用第一预设语言工具对所述目标单语句进行分词,以得到所述目标单语句所包含的若干个短语;
    采用第二预设语言工具对每个所述短语进行词性标注,以得到所述短语所对应的词性标注信息,所述词性标注信息包括所述短语及所述短语所对应的词性;
    基于所有所述短语及所述短语所对应的所述词性标注信息,通过预设编码方式进行编码, 以得到所述目标单语句所包含的外部信息编码。
  18. 根据权利要求17所述计算机可读存储介质,其中,所述外部信息编码为分词依存关系编码,所述基于所有所述短语及所述短语所对应的所述词性标注信息,通过预设编码方式进行编码,以得到所述目标单语句所包含的外部信息编码的步骤包括:
    采用第三预设语言工具对所述短语及所述词性标注信息进行依存关系分析,以得到依存关系树;
    采用预设依存编码方式对所述依存关系树进行编码,以得到所述目标单语句所包含的外部信息编码。
  19. 根据权利要求18所述计算机可读存储介质,其中,所述预设依存编码方式为预设相对依存位置编码方式或者为预设绝对依存位置编码方式。
  20. 根据权利要求17所述计算机可读存储介质,其中,所述外部信息编码为词性标注信息编码,所述基于所有所述短语及所述短语所对应的所述词性标注信息,通过预设编码方式进行编码,以得到所述目标单语句所包含的外部信息编码的步骤包括:
    采用预设BIES标注方式对每个所述短语所对应的词性标注信息进行编码以得到每个所述词性标注信息所对应的4个编码;
    对K个词性标注信息进行编码以得到4K个编码,从而得到所述目标单语句所包含的外部信息编码,其中,K为自然数。
PCT/CN2020/118735 2020-07-16 2020-09-29 单语句自然语言处理方法、装置、计算机设备及可读存储介质 WO2021143206A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010688324.4A CN111832318B (zh) 2020-07-16 2020-07-16 单语句自然语言处理方法、装置、计算机设备及可读存储介质
CN202010688324.4 2020-07-16

Publications (1)

Publication Number Publication Date
WO2021143206A1 true WO2021143206A1 (zh) 2021-07-22

Family

ID=72924333

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/118735 WO2021143206A1 (zh) 2020-07-16 2020-09-29 单语句自然语言处理方法、装置、计算机设备及可读存储介质

Country Status (2)

Country Link
CN (1) CN111832318B (zh)
WO (1) WO2021143206A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113609275A (zh) * 2021-08-24 2021-11-05 腾讯科技(深圳)有限公司 信息处理方法、装置、设备及存储介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112348604B (zh) * 2020-11-26 2023-11-17 税友软件集团股份有限公司 发票商品编码赋值方法、系统、装置及可读存储介质
CN114997140B (zh) * 2021-09-17 2023-04-28 荣耀终端有限公司 校验语义的方法和装置
CN114639489B (zh) * 2022-03-21 2023-03-24 广东莲藕健康科技有限公司 基于相互学习的问诊快捷回复推荐方法、装置及电子设备

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160232142A1 (en) * 2014-08-29 2016-08-11 Yandex Europe Ag Method for text processing
CN110489750A (zh) * 2019-08-12 2019-11-22 昆明理工大学 基于双向lstm-crf的缅甸语分词及词性标注方法及装置
CN111062217A (zh) * 2019-12-19 2020-04-24 江苏满运软件科技有限公司 语言信息的处理方法、装置、存储介质及电子设备

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11544461B2 (en) * 2019-05-14 2023-01-03 Intel Corporation Early exit for natural language processing models
CN111291166B (zh) * 2020-05-09 2020-11-03 支付宝(杭州)信息技术有限公司 基于Bert的语言模型的训练方法及装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160232142A1 (en) * 2014-08-29 2016-08-11 Yandex Europe Ag Method for text processing
CN110489750A (zh) * 2019-08-12 2019-11-22 昆明理工大学 基于双向lstm-crf的缅甸语分词及词性标注方法及装置
CN111062217A (zh) * 2019-12-19 2020-04-24 江苏满运软件科技有限公司 语言信息的处理方法、装置、存储介质及电子设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CHEN LEI, ZHENG WEIYAN;YU HUIHUA;FU JING;LIU HONGWEI;XIA JUNQIANG: "Research on Language Model for Speech Recognition of Power Grid Dispatching Based on BERT", POWER SYSTEM TECHNOLOGY, SHUILI-DIANLIBU DIANLI KEXUE YANJIUYUAN, BEIJING, CN, 11 July 2020 (2020-07-11), CN, XP055829726, ISSN: 1000-3673 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113609275A (zh) * 2021-08-24 2021-11-05 腾讯科技(深圳)有限公司 信息处理方法、装置、设备及存储介质
CN113609275B (zh) * 2021-08-24 2024-03-26 腾讯科技(深圳)有限公司 信息处理方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN111832318B (zh) 2023-03-21
CN111832318A (zh) 2020-10-27

Similar Documents

Publication Publication Date Title
WO2021143206A1 (zh) 单语句自然语言处理方法、装置、计算机设备及可读存储介质
WO2021072852A1 (zh) 序列标注方法、系统和计算机设备
CN111931517B (zh) 文本翻译方法、装置、电子设备以及存储介质
WO2022121251A1 (zh) 文本处理模型训练方法、装置、计算机设备和存储介质
US11636272B2 (en) Hybrid natural language understanding
CN108228574B (zh) 文本翻译处理方法及装置
WO2021208460A1 (zh) 语句补全方法、设备及可读存储介质
CN113051374B (zh) 一种文本匹配优化方法及装置
CN110188926A (zh) 一种订单信息预测系统和方法
CN114912450B (zh) 信息生成方法与装置、训练方法、电子设备和存储介质
CN115640520A (zh) 跨语言跨模态模型的预训练方法、设备和存储介质
CN113626608B (zh) 增强语义的关系抽取方法、装置、计算机设备及存储介质
CN113743101A (zh) 文本纠错方法、装置、电子设备和计算机存储介质
CN111783424B (zh) 一种文本分句方法和装置
CN112052329A (zh) 文本摘要生成方法、装置、计算机设备及可读存储介质
US20230153550A1 (en) Machine Translation Method and Apparatus, Device and Storage Medium
US20220254351A1 (en) Method and system for correcting speaker diarization using speaker change detection based on text
CN116483314A (zh) 一种自动化智能活动图生成方法
CN115620726A (zh) 语音文本生成方法、语音文本生成模型的训练方法、装置
WO2022267460A1 (zh) 基于事件的情感分析方法、装置、计算机设备及存储介质
CN114298032A (zh) 文本标点检测方法、计算机设备及存储介质
CN113283218A (zh) 一种语义文本压缩方法及计算机设备
Ghosh et al. Span classification with structured information for disfluency detection in spoken utterances
US11709989B1 (en) Method and system for generating conversation summary
CN115577680B (zh) 古籍文本断句方法与装置、古籍文本断句模型训练方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20914221

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20914221

Country of ref document: EP

Kind code of ref document: A1