CN110838287B - Corpus processing method and device of chat robot in vehicle-mounted environment and storage medium - Google Patents

Corpus processing method and device of chat robot in vehicle-mounted environment and storage medium Download PDF

Info

Publication number
CN110838287B
CN110838287B CN201910984527.5A CN201910984527A CN110838287B CN 110838287 B CN110838287 B CN 110838287B CN 201910984527 A CN201910984527 A CN 201910984527A CN 110838287 B CN110838287 B CN 110838287B
Authority
CN
China
Prior art keywords
question
file
response
sentence
standardized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910984527.5A
Other languages
Chinese (zh)
Other versions
CN110838287A (en
Inventor
裴丽珊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
FAW Group Corp
Original Assignee
FAW Group Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by FAW Group Corp filed Critical FAW Group Corp
Priority to CN201910984527.5A priority Critical patent/CN110838287B/en
Publication of CN110838287A publication Critical patent/CN110838287A/en
Application granted granted Critical
Publication of CN110838287B publication Critical patent/CN110838287B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J11/00Manipulators not otherwise provided for
    • B25J11/0005Manipulators having means for high-level communication with users, e.g. speech generator, face recognition means
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Landscapes

  • Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Mechanical Engineering (AREA)
  • Robotics (AREA)
  • Artificial Intelligence (AREA)
  • Navigation (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a corpus processing method, a device and a storage medium of a chat robot in a vehicle-mounted environment, wherein the method comprises the following steps: the method comprises the steps of determining a dialogue description text with question-answer relations in a vehicle-mounted environment based on voice function classification in the vehicle-mounted environment, determining a question file and a response file according to the question-answer relations in the dialogue description text, inputting the question file and the response file into a Seq2Seq model, training the Seq2Seq model to form a question-answer model, and saving more complicated corpus processing steps when corpus processing is carried out.

Description

Corpus processing method and device of chat robot in vehicle-mounted environment and storage medium
Technical Field
The embodiment of the invention relates to the field of vehicle networking, in particular to a corpus processing method and device of a chat robot in a vehicle-mounted environment and a storage medium.
Background
With the development of the automobile industry, the field of car networking receives more and more attention. In the field of car networking, artificial intelligence technology plays a vital role. The chat robot in the vehicle-mounted environment has become one of the key points of each big vehicle and each big vehicle each other. Corpus processing for chat robots in vehicular environments is also becoming increasingly important.
At present, the following method is adopted to perform corpus processing of the chat robot in the vehicle-mounted environment: firstly, cleaning original data, and cleaning and deleting contents which are not interested and are regarded as noise; then, performing word segmentation on the data, and performing word segmentation on all texts according to the minimum unit granularity of text processing, namely words or phrases; secondly, performing part-of-speech tagging on the words after word segmentation, and marking a part-of-speech tag on each word or word; finally, words that do not contribute to the text features are removed.
However, the above processing steps are complicated, which results in low corpus processing efficiency.
Disclosure of Invention
The invention provides a corpus processing method, a corpus processing device and a storage medium of a chat robot in a vehicle-mounted environment, and aims to solve the technical problem that the processing efficiency is low due to the fact that the corpus processing steps are complicated at present.
In a first aspect, an embodiment of the present invention provides a corpus processing method for a chat robot in a vehicle-mounted environment, including:
determining a dialogue description text with question-answer relationship in the vehicle-mounted environment based on the voice function classification in the vehicle-mounted environment;
determining a question file and a response file according to the question-answer relationship in the dialog description text;
inputting the question file and the response file into a Seq2Seq model, and training the Seq2Seq model to form a question-answer model.
In the method shown above, the determining a dialog description text having a question-answer relationship in the vehicle-mounted environment based on the voice function classification in the vehicle-mounted environment includes:
performing sub-function division on each voice function;
determining a typical conversation of each sub-function and a conversation description of the broadcast content from text to voice TTS in a plurality of sub-functions included in each voice function;
and determining the dialog description text according to the dialog descriptions corresponding to all the voice functions.
In the method as shown above, the speech function classification includes: system control, music, radio, telephone, navigation, video, charging pile, weather, stock and hotel.
In the method shown above, the determining a dialog description text having a question-answer relationship in an in-vehicle environment includes:
storing the question sentences in odd lines and the response sentences in even lines to form the dialogue description text;
correspondingly, the determining a quiz file and a response file according to the question-answer relationship in the dialog description text comprises:
extracting statements of odd lines in the dialog description text to form the question file;
and extracting sentences of even lines in the dialog description text to form the response file.
In the method as described above, the inputting the question file and the response file into a Seq2Seq model, and training the Seq2Seq model to form a question-answer model includes:
carrying out length standardization on each question sentence in the question file and each response sentence in the response file to form a standardized question file and a standardized response file;
respectively carrying out statement vector conversion on the standardized question file and the standardized response file to form a vectorized question file and a vectorized response file;
and inputting the vectorization question file and the vectorization response file into the Seq2Seq model, and training the Seq2Seq model to form a question-answer model.
In the method as described above, the normalizing the lengths of each question sentence in the question file and each response sentence in the response file to form a normalized question file and a normalized response file includes:
taking the character number of the longest question sentence in all question sentences as the character number of a standard question sentence, and taking the character number of the longest response sentence in all response sentences as the character number of the standard response sentence;
filling the length of the question sentence with the length smaller than the character number of the standard question sentence into the characters of the standard question sentence to form a filled question sentence;
adding start and stop characters to the question sentences with the length equal to the character number of the standard question sentences and the filled question sentences to form the standardized question files;
filling the length of the response sentence with the length smaller than the character number of the standard response sentence into the characters of the standard response sentence to form a filled response sentence;
and adding start-stop characters to the response sentences with the length equal to the number of the characters of the standard response sentences and the filled response sentences to form the standardized response file.
In the method as shown above, the performing statement vector conversion on the standardized question file and the standardized response file respectively to form a vectorized question file and a vectorized response file includes:
counting the occurrence times of the characters in the standardized question file and the standardized response file, and arranging the characters according to the sequence of the occurrence times from small to large to generate a character dictionary;
determining a vector corresponding to each character according to the corresponding relation between the character dictionary and the statement vector;
and forming the vectorization question file and the vectorization response file according to the vector corresponding to each character.
In a second aspect, an embodiment of the present invention provides a corpus processing apparatus of a chat robot in a vehicle-mounted environment, including:
the system comprises a first determination module, a second determination module and a third determination module, wherein the first determination module is used for determining a dialogue description text with question-answer relationship in the vehicle-mounted environment based on the voice function classification in the vehicle-mounted environment;
the second determining module is used for determining a question file and a response file according to the question-answer relationship in the dialog description text;
and the training module is used for inputting the question file and the response file into a sequence pair Seq2Seq model, training the Seq2Seq model and forming a question-answer model.
In a third aspect, an embodiment of the present invention further provides a server, where the server includes:
one or more processors;
a memory for storing one or more programs;
when the one or more programs are executed by the one or more processors, the one or more processors implement the corpus processing method of the chat robot in the vehicle-mounted environment according to the first aspect.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the corpus processing method of the chat robot in the vehicle-mounted environment according to the first aspect.
The embodiment provides a corpus processing method, a device and a storage medium of a chat robot in a vehicle-mounted environment, wherein the method comprises the following steps: the method comprises the steps of determining a dialogue description text with question-answer relations in a vehicle-mounted environment based on voice function classification in the vehicle-mounted environment, determining a question file and a response file according to the question-answer relations in the dialogue description text, inputting the question file and the response file into a Seq2Seq model, training the Seq2Seq model to form a question-answer model, and saving more complicated corpus processing steps when corpus processing is carried out.
Drawings
FIG. 1 is a schematic flow chart illustrating an embodiment of a corpus processing method of a chat robot in a vehicle-mounted environment according to the present invention;
FIG. 2A is a flow chart illustrating an implementation of step 101 in the embodiment shown in FIG. 1;
FIG. 2B is a flow chart illustrating one implementation of step 103 in the embodiment shown in FIG. 1;
FIG. 3 is a schematic structural diagram of a corpus processing apparatus of a chat robot in a vehicle-mounted environment according to an embodiment of the present invention;
FIG. 4A is a schematic diagram of a possible structure of the first determining module in the embodiment shown in FIG. 3;
FIG. 4B is a schematic diagram of a possible structure of the training module in the embodiment shown in FIG. 3;
fig. 5 is a schematic structural diagram of a server provided in the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Fig. 1 is a schematic flow chart of an embodiment of a corpus processing method of a chat robot in a vehicle-mounted environment according to the present invention. The embodiment is suitable for a scene of processing the corpus of the chat robot in the vehicle-mounted environment. The language material processing device of the chat robot in the vehicle-mounted environment can be implemented in a software and/or hardware manner, and can be integrated in a server. As shown in fig. 1, the corpus processing method of the chat robot in the vehicle-mounted environment provided by this embodiment includes the following steps:
step 101: and determining the dialogue description text with question-answer relationship in the vehicle-mounted environment based on the voice function classification in the vehicle-mounted environment.
Specifically, the present embodiment first determines a dialog description text in the in-vehicle environment. The dialog description text has a question-answer relationship. The conversation description text is the conversation description text of the user and the chat robot which possibly occur in the vehicle-mounted environment.
In this embodiment, according to whole car model voice function demand, can divide into any kind or multiple function in system control, music, radio station, telephone, navigation, video, the electric pile of filling, weather, stock and hotel with the voice function in the vehicle-mounted environment.
In order to ensure the comprehensiveness of the dialog description text, the following method may be adopted in the present embodiment to determine the dialog description text. Fig. 2A is a schematic flow chart of an implementation process of step 101 in the embodiment shown in fig. 1. As shown in fig. 2A, step 101 may include the following specific steps.
Step 1011: each voice function is sub-functionally divided.
In order to ensure that the dialog description text covers the speech requirements in the vehicle environment as completely as possible, in step 1011, each speech function is sub-functionally divided. For example, the "navigation" function may be refined into a "path navigation" function, a "road condition navigation" function, and a "function search" function, and the "system control" function may be refined into an "air conditioner control" function, a "window control" function, a "door lock control" function, and the like.
Step 1012: determining a typical dialog of each sub-function and a dialog description from text to voice broadcast content in a plurality of sub-functions included in each voice function.
After each voice function is sub-functionally divided, a dialog description for typical dialogs and contents broadcasted from Text To Speech (TTS) can be determined for each sub-function. For example, for the "air conditioning control" function, the possible typical words and TTS broadcast contents may be: turning on air conditioner, turning up temperature.
It should be noted that the typical dialogs of each sub-function and the description of the dialogs from text to voice broadcast content may be determined empirically by the developer. The typical dialogs in this embodiment refer to statements that are used more frequently.
After determining the typical dialogs for each sub-function and the dialog description for the TTS broadcast content, the dialog description for each voice function can be determined.
Step 1013: and determining a dialog description text according to the dialog descriptions corresponding to all the voice functions.
After determining the dialog description corresponding to each voice function, the dialog descriptions corresponding to all the voice functions may be merged to form a dialog description text in step 1012.
Step 102: and determining a question file and a response file according to the question-answer relationship in the dialog description text.
Specifically, in step 102, a question file and a response file are determined based on the question-answer relationship of the dialog description text determined in step 101.
One possible implementation is: in step 101, the question sentences are stored in odd lines, and the response sentences are stored in even lines to form a dialogue description text, then in step 102, the sentences in the odd lines in the dialogue description text are extracted to form a question file, and the sentences in the even lines in the dialogue description text are extracted to form a response file.
Of course, if the question sentences are stored in even lines and the response sentences are stored in odd lines to form the dialogue description text in step 101, the even-numbered lines of the dialogue description text are extracted to form the question file and the odd-numbered lines of the dialogue description text are extracted to form the response file in step 102.
Another possible implementation is: in step 102, a question file and a response file are determined based on the semantics of each sentence in the dialog description text.
After the challenge and response documents are determined, a Sequence to Sequence (Seq 2Seq) model may be trained based on the challenge and response documents.
The Seq2Seq model consists of two main components, one is the encoder Recurrent Neural Network (RNN) and the other is the decoder RNN. At a high level, the encoder works to generate a fixed representation of the input text information. The decoder receives the representation and generates a variable length text in response thereto.
Step 103: and inputting the question file and the response file into a Seq2Seq model, and training the Seq2Seq model to form a question-answer model.
Specifically, a chat robot in a vehicle-mounted environment can be formed based on the question-answer model. The chat robots are classified into an indexing chat robot and a generating chat robot.
In the retrieval type chat robot, a dialogue base is pre-stored, and after receiving a sentence input by a user, a chat system extracts response content in the dialogue base in a searching and matching mode. It is obvious that this method has high requirements on the dialog library, and the dialog library needs to be large enough to match the question of the user as much as possible, otherwise, the situation that the appropriate answer content cannot be found often occurs. Because in a real scene, the user can say what is possible, but the advantage is that the answer quality is high, because the content in the dialogue library is real dialogue data and the expression is natural.
The generating type chatting robot adopts different technical ideas, after a sentence input by a user is received, a sentence is automatically generated as a response by adopting a certain technical means, the robot of the route has the advantages that the user question sentence of any topic can be covered, but the defect is that the quality of the generated response sentence is likely to have problems, for example, the errors that the sentence is not smooth, the syntax is wrong and the like which look lower are likely to exist.
Since the Seq2Seq model has a requirement for the format of the input file, it is necessary to first perform format conversion processing on the question file and the response file in step 103.
FIG. 2B is a flowchart illustrating an implementation of step 103 in the embodiment shown in FIG. 1. As shown in fig. 2B, step 103 includes the following steps.
Step 1031: and carrying out length standardization on each question sentence in the question file and each response sentence in the response file to form a standardized question file and a standardized response file.
One possible implementation is: taking the character number of the longest question sentence in all question sentences as the character number of a standard question sentence, and taking the character number of the longest response sentence in all response sentences as the character number of the standard response sentence; filling the length of the question sentence with the length smaller than the character number of the standard question sentence into the characters of the standard question sentence to form the filled question sentence; adding start and stop characters to the question sentences with the length equal to the number of the characters of the standard question sentences and the filled question sentences to form a standard question file; filling the length of the response sentence with the length smaller than the character number of the standard response sentence into the characters of the standard response sentence to form a filled response sentence; and adding start and stop characters to the response sentences with the length equal to the number of the characters of the standard response sentences and the filled response sentences to form a standardized response file.
In this implementation, when character filling is performed on a question sentence whose length is smaller than the number of characters of the standard question sentence, and a response sentence whose length is smaller than the number of characters of the standard response sentence, it may be to fill in predefined nonsense characters.
In this implementation, the start-stop characters may be added to the question sentence and the response sentence, and then character filling may be performed. This embodiment is not limited thereto.
Each question sentence in the finally formed standardized question file is equal in length, and each question sentence has a start-stop sign to distinguish from other question sentences. Each response statement in the finally formed standardized response file is equal in length and has a start-stop sign.
Step 1032: and respectively carrying out statement vector conversion on the standardized question file and the standardized response file to form a vectorized question file and a vectorized response file.
One possible implementation is: counting the occurrence times of the characters in the standardized question file and the standardized response file, and arranging the characters according to the sequence of the occurrence times from small to large to generate a character dictionary; determining a vector corresponding to each character according to the corresponding relation between the character dictionary and the statement vector; and forming a vectorization question file and a vectorization response file according to the vector corresponding to each character.
In this implementation, a longer-length vector may be assigned to the character arranged at the front in the character dictionary, and a shorter-length vector may be assigned to the character arranged at the back, so that the length of the vector converted from the character with a smaller occurrence number is longer, and the length of the vector converted from the character with a larger occurrence number is shorter, and the sizes of the vectorization question file and the vectorization response file may be reduced, so that the training efficiency of the Seq2Seq model is higher.
Step 1033: and inputting the vectorization question file and the vectorization response file into a Seq2Seq model, and training the Seq2Seq model to form a question-answer model.
During training, the vectorization question file can be divided into a training vectorization question file and a testing vectorization question file, and the vectorization response file is divided into a training vectorization response file and a testing vectorization response file, so that the Seq2Seq model can be trained and tested.
After forming the question-answer model, the question-answer model may predict a corresponding answer to the input sentence.
According to the corpus processing method of the chat robot in the vehicle-mounted environment, the dialogue description text with the question-answer relationship in the vehicle-mounted environment is determined through the voice function classification in the vehicle-mounted environment, the question file and the response file are determined according to the question-answer relationship in the dialogue description text, the question file and the response file are input into the Seq2Seq model, the Seq2Seq model is trained to form the question-answer model, complicated corpus processing steps are omitted when corpus processing is carried out, and the corpus processing efficiency is improved by adopting a question-answer separation type processing method.
Fig. 3 is a schematic structural diagram of a corpus processing device of a chat robot in a vehicle-mounted environment according to an embodiment of the present invention. The corpus processing device of the chat robot in the vehicle-mounted environment can be integrated in a server. As shown in fig. 3, the corpus processing apparatus of the chat robot in the vehicle-mounted environment according to the present embodiment includes: a first determination module 31, a second determination module 32 and a training module 33.
The first determining module 31 is configured to determine a dialog description text having a question-answer relationship in the vehicle-mounted environment based on the speech function classification in the vehicle-mounted environment.
Optionally, the voice function classification includes: system control, music, radio, telephone, navigation, video, charging pile, weather, stock and hotel.
Fig. 4A is a schematic diagram of a possible structure of the first determining module in the embodiment shown in fig. 3. As shown in fig. 4A, optionally, the first determining module 31 may specifically include: a partitioning sub-module 311, a first determining sub-module 312, and a second determining sub-module 313.
And a division submodule 311 for performing sub-function division on each voice function.
The first determining sub-module 312 is configured to determine a typical dialog of each sub-function and a dialog description for broadcasting contents from the TTS, among a plurality of sub-functions included in each voice function.
And the second determining submodule 313 is used for determining a dialog description text according to the dialog descriptions corresponding to all the voice functions.
And the second determining module 32 is configured to determine the question file and the response file according to the question-answer relationship in the dialog description text.
In one implementation, the first determining module 31 is specifically configured to: question sentences are stored in odd-numbered lines, and response sentences are stored in even-numbered lines, forming a dialog description text. Accordingly, the second determining module 32 is specifically configured to: extracting statements of odd lines in the dialogue description text to form a question file; and extracting sentences of even lines in the dialog description text to form a response file.
The training module 33 is configured to input the question file and the response file into the Seq2Seq model, and train the Seq2Seq model to form a question-answer model.
FIG. 4B is a schematic diagram of a possible structure of the training module in the embodiment shown in FIG. 3. As shown in fig. 4B, optionally, the training module 33 may specifically include: a processing sub-module 331, a conversion sub-module 332, and a training sub-module 333.
The processing sub-module 331 is configured to perform length normalization on each question statement in the question file and each response statement in the response file to form a normalized question file and a normalized response file.
Optionally, the processing sub-module 331 is specifically configured to: taking the character number of the longest question sentence in all question sentences as the character number of a standard question sentence, and taking the character number of the longest response sentence in all response sentences as the character number of the standard response sentence; filling the length of the question sentence with the length smaller than the character number of the standard question sentence into the characters of the standard question sentence to form the filled question sentence; adding start and stop characters to the question sentences with the length equal to the number of the characters of the standard question sentences and the filled question sentences to form a standard question file; filling the length of the response sentence with the length smaller than the character number of the standard response sentence into the characters of the standard response sentence to form a filled response sentence; and adding start and stop characters to the response sentences with the length equal to the number of the characters of the standard response sentences and the filled response sentences to form a standardized response file.
The conversion submodule 332 is configured to perform statement vector conversion on the standardized question file and the standardized response file respectively to form a vectorized question file and a vectorized response file.
Optionally, the conversion submodule 332 is specifically configured to: counting the occurrence times of the characters in the standardized question file and the standardized response file, and arranging the characters according to the sequence of the occurrence times from small to large to generate a character dictionary; determining a vector corresponding to each character according to the corresponding relation between the character dictionary and the statement vector; and forming a vectorization question file and a vectorization response file according to the vector corresponding to each character.
The training submodule 333 is configured to input the vectorized question file and the vectorized response file into the Seq2Seq model, and train the Seq2Seq model to form a question-and-answer model.
The corpus processing device of the chat robot in the vehicle-mounted environment provided by the embodiment of the invention can execute the corpus processing method of the chat robot in the vehicle-mounted environment provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
Fig. 5 is a schematic structural diagram of a server provided in the present invention. As shown in fig. 5, the server includes a processor 70 and a memory 71. The number of the processors 70 in the server may be one or more, and one processor 70 is taken as an example in fig. 5; the processor 70 and the memory 71 of the server may be connected by a bus or other means, as exemplified by the bus connection in fig. 5.
The memory 71 is a computer-readable storage medium, and can be used for storing software programs, computer-executable programs, and modules, such as program instructions and modules corresponding to the corpus processing method of the chat robot in the vehicle-mounted environment in the embodiment of the present invention (for example, the first determining module 31, the second determining module 32, and the training module 33 in the corpus processing apparatus of the chat robot in the vehicle-mounted environment). The processor 70 executes various functional applications and data processing of the server by running software programs, instructions and modules stored in the memory 71, so as to implement the corpus processing method of the chat robot in the vehicle-mounted environment.
The memory 71 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the server, and the like. Further, the memory 71 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory 71 may further include memory remotely located from the processor 70, and these remote memories may be connected to a server over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The present invention also provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform a corpus processing method of a chat robot in a vehicle-mounted environment, the method comprising:
determining a dialogue description text with question-answer relationship in the vehicle-mounted environment based on the voice function classification in the vehicle-mounted environment;
determining a question file and a response file according to the question-answer relationship in the dialog description text;
inputting the question file and the response file into a Seq2Seq model, and training the Seq2Seq model to form a question-answer model.
Of course, the storage medium containing the computer-executable instructions provided in the embodiments of the present invention is not limited to the above-described method operations, and may also perform related operations in the corpus processing method of the chat robot in the vehicle-mounted environment provided in any embodiment of the present invention.
From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.
It should be noted that, in the embodiment of the corpus processing apparatus of the chat robot in the vehicle-mounted environment, the included units and modules are only divided according to functional logic, but are not limited to the above division, as long as the corresponding functions can be realized; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (8)

1. A corpus processing method of a chat robot in a vehicle-mounted environment is characterized by comprising the following steps:
determining a dialogue description text with question-answer relationship in the vehicle-mounted environment based on the voice function classification in the vehicle-mounted environment;
determining a question file and a response file according to the question-answer relationship in the dialog description text;
inputting the question file and the response file into a sequence pair Seq2Seq model, and training the Seq2Seq model to form a question-answer model;
inputting the question file and the response file into a sequence pair Seq2Seq model, training the Seq2Seq model, and forming a question-answer model, wherein the method comprises the following steps:
carrying out length standardization on each question sentence in the question file and each response sentence in the response file to form a standardized question file and a standardized response file;
respectively carrying out statement vector conversion on the standardized question file and the standardized response file to form a vectorized question file and a vectorized response file;
inputting the vectorized question file and the vectorized response file into the Seq2Seq model, and training the Seq2Seq model to form a question-answer model;
the performing statement vector conversion on the standardized question file and the standardized response file respectively to form a vectorized question file and a vectorized response file, including:
counting the occurrence times of the characters in the standardized question file and the standardized response file, and arranging the characters according to the sequence of the occurrence times from small to large to generate a character dictionary;
determining a vector corresponding to each character according to the corresponding relation between the character dictionary and the statement vector;
forming the vectorization question file and the vectorization response file according to the vector corresponding to each character;
wherein, the determining the vector corresponding to each character according to the corresponding relationship between the character dictionary and the sentence vector comprises: the character dictionary is given a long-length vector to the character arranged at the front, and a short-length vector to the character arranged at the back.
2. The method of claim 1, wherein determining the dialog description text with question-answer relationship in the vehicle-mounted environment based on the speech function classification in the vehicle-mounted environment comprises:
performing sub-function division on each voice function;
determining a typical conversation of each sub-function and a conversation description of the broadcast content from text to voice TTS in a plurality of sub-functions included in each voice function;
and determining the dialog description text according to the dialog descriptions corresponding to all the voice functions.
3. The method of claim 2, wherein the speech function classification comprises: system control, music, radio, telephone, navigation, video, charging pile, weather, stock and hotel.
4. The method according to any one of claims 1-3, wherein the determining of the dialog description text having question-answer relations in the vehicle-mounted environment comprises:
storing the question sentences in odd lines and the response sentences in even lines to form the dialogue description text;
correspondingly, the determining a quiz file and a response file according to the question-answer relationship in the dialog description text comprises:
extracting statements of odd lines in the dialog description text to form the question file;
and extracting sentences of even lines in the dialog description text to form the response file.
5. The method according to claim 1, wherein the length standardization of each question sentence in the question file and each response sentence in the response file to form a standardized question file and a standardized response file comprises:
taking the character number of the longest question sentence in all question sentences as the character number of a standard question sentence, and taking the character number of the longest response sentence in all response sentences as the character number of the standard response sentence;
filling the length of the question sentence with the length smaller than the character number of the standard question sentence into the characters of the standard question sentence to form a filled question sentence;
adding start and stop characters to the question sentences with the length equal to the character number of the standard question sentences and the filled question sentences to form the standardized question files;
filling the length of the response sentence with the length smaller than the character number of the standard response sentence into the characters of the standard response sentence to form a filled response sentence;
and adding start-stop characters to the response sentences with the length equal to the number of the characters of the standard response sentences and the filled response sentences to form the standardized response file.
6. A corpus processing apparatus of a chat robot in a vehicle-mounted environment, comprising:
the system comprises a first determination module, a second determination module and a third determination module, wherein the first determination module is used for determining a dialogue description text with question-answer relationship in the vehicle-mounted environment based on the voice function classification in the vehicle-mounted environment;
the second determining module is used for determining a question file and a response file according to the question-answer relationship in the dialog description text;
the training module is used for inputting the question file and the response file into a sequence pair Seq2Seq model, training the Seq2Seq model and forming a question-answer model;
wherein the training module comprises: the system comprises a processing submodule, a conversion submodule and a training submodule;
the processing submodule is used for carrying out length standardization processing on each question sentence in the question file and each response sentence in the response file to form a standardized question file and a standardized response file;
the conversion submodule is used for respectively carrying out statement vector conversion on the standardized question file and the standardized response file to form a vectorized question file and a vectorized response file;
the training submodule is used for inputting the vectorization question file and the vectorization response file into the Seq2Seq model and training the Seq2Seq model to form a question-answer model;
the performing statement vector conversion on the standardized question file and the standardized response file respectively to form a vectorized question file and a vectorized response file, including:
counting the occurrence times of the characters in the standardized question file and the standardized response file, and arranging the characters according to the sequence of the occurrence times from small to large to generate a character dictionary;
determining a vector corresponding to each character according to the corresponding relation between the character dictionary and the statement vector;
forming the vectorization question file and the vectorization response file according to the vector corresponding to each character;
wherein, the determining the vector corresponding to each character according to the corresponding relationship between the character dictionary and the sentence vector comprises: the character dictionary is given a long-length vector to the character arranged at the front, and a short-length vector to the character arranged at the back.
7. A server, characterized in that the server comprises:
one or more processors;
a memory for storing one or more programs;
when the one or more programs are executed by the one or more processors, the one or more processors implement the corpus processing method of the chat robot in the vehicle-mounted environment according to any one of claims 1 to 5.
8. A computer-readable storage medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the corpus processing method of a chat robot in a vehicle environment according to any of claims 1 to 5.
CN201910984527.5A 2019-10-16 2019-10-16 Corpus processing method and device of chat robot in vehicle-mounted environment and storage medium Active CN110838287B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910984527.5A CN110838287B (en) 2019-10-16 2019-10-16 Corpus processing method and device of chat robot in vehicle-mounted environment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910984527.5A CN110838287B (en) 2019-10-16 2019-10-16 Corpus processing method and device of chat robot in vehicle-mounted environment and storage medium

Publications (2)

Publication Number Publication Date
CN110838287A CN110838287A (en) 2020-02-25
CN110838287B true CN110838287B (en) 2022-04-19

Family

ID=69575515

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910984527.5A Active CN110838287B (en) 2019-10-16 2019-10-16 Corpus processing method and device of chat robot in vehicle-mounted environment and storage medium

Country Status (1)

Country Link
CN (1) CN110838287B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101916259A (en) * 2010-07-06 2010-12-15 中国科学院计算技术研究所 Space compression method of state transition table of deterministic automaton
CN105024993A (en) * 2015-05-25 2015-11-04 上海南邮实业有限公司 Protocol comparison method based on vector operation

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3573688B2 (en) * 2000-06-28 2004-10-06 松下電器産業株式会社 Similar document search device and related keyword extraction device
JP5824430B2 (en) * 2012-08-10 2015-11-25 日本電信電話株式会社 Spam feature calculation apparatus, spam feature calculation method, and program
GB2517212B (en) * 2013-08-16 2018-04-25 Toshiba Res Europe Limited A Computer Generated Emulation of a subject
CN108319599B (en) * 2017-01-17 2021-02-26 华为技术有限公司 Man-machine conversation method and device
US11334608B2 (en) * 2017-11-23 2022-05-17 Infosys Limited Method and system for key phrase extraction and generation from text
CN107798140B (en) * 2017-11-23 2020-07-03 中科鼎富(北京)科技发展有限公司 Dialog system construction method, semantic controlled response method and device
US11003774B2 (en) * 2018-01-26 2021-05-11 Sophos Limited Methods and apparatus for detection of malicious documents using machine learning
CN108920560B (en) * 2018-06-20 2022-10-04 腾讯科技(深圳)有限公司 Generation method, training method, device, computer readable medium and electronic equipment
CN109063164A (en) * 2018-08-15 2018-12-21 百卓网络科技有限公司 A kind of intelligent answer method based on deep learning
CN109885832A (en) * 2019-02-14 2019-06-14 平安科技(深圳)有限公司 Model training, sentence processing method, device, computer equipment and storage medium
CN109829166B (en) * 2019-02-15 2022-12-27 重庆师范大学 People and host customer opinion mining method based on character-level convolutional neural network
CN109977407A (en) * 2019-03-27 2019-07-05 北京信息科技大学 A kind of multi-level difference analysis method of Written Texts of word-based insertion

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101916259A (en) * 2010-07-06 2010-12-15 中国科学院计算技术研究所 Space compression method of state transition table of deterministic automaton
CN105024993A (en) * 2015-05-25 2015-11-04 上海南邮实业有限公司 Protocol comparison method based on vector operation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A feature fusion based optical character recognition of Bangla characters using support vector machine;Mst. Tasnim Pervin;《2017 3rd International Conference on Electrical Information and Communication Technology (EICT)》;20171231;1-6 *
基于字符级卷积神经网络的民宿顾客意见挖掘;张振;《中国优秀硕士学位论文全文数据库信息科技辑》;20190831;I138-1492 *

Also Published As

Publication number Publication date
CN110838287A (en) 2020-02-25

Similar Documents

Publication Publication Date Title
CN109657054B (en) Abstract generation method, device, server and storage medium
CN112100349B (en) Multi-round dialogue method and device, electronic equipment and storage medium
CN110807332B (en) Training method, semantic processing method, device and storage medium for semantic understanding model
CN111191016B (en) Multi-round dialogue processing method and device and computing equipment
CN110795945B (en) Semantic understanding model training method, semantic understanding device and storage medium
CN107656996B (en) Man-machine interaction method and device based on artificial intelligence
CN106372054B (en) Method and device for multi-language semantic analysis
CN109616096A (en) Construction method, device, server and the medium of multilingual tone decoding figure
US11636272B2 (en) Hybrid natural language understanding
CN112562640B (en) Multilingual speech recognition method, device, system, and computer-readable storage medium
CN112579733B (en) Rule matching method, rule matching device, storage medium and electronic equipment
CN114676691B (en) Identification method, system, equipment and computer readable storage medium
CN111414471B (en) Method and device for outputting information
CN112527955A (en) Data processing method and device
CN111737990A (en) Word slot filling method, device, equipment and storage medium
CN113486170A (en) Natural language processing method, device, equipment and medium based on man-machine interaction
CN110020429B (en) Semantic recognition method and device
CN111428011B (en) Word recommendation method, device, equipment and storage medium
CN110838287B (en) Corpus processing method and device of chat robot in vehicle-mounted environment and storage medium
CN112559725A (en) Text matching method, device, terminal and storage medium
CN115620726A (en) Voice text generation method, and training method and device of voice text generation model
CN115019786A (en) Model training method and device and speech meaning understanding method and device
CN111625636A (en) Man-machine conversation refusal identification method, device, equipment and medium
CN110956962A (en) Reply information determination method, device and equipment for vehicle-mounted robot
CN116312485B (en) Voice recognition method and device and vehicle

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant