CN117112760A

CN117112760A - Intelligent education big model based on knowledge base

Info

Publication number: CN117112760A
Application number: CN202311082062.7A
Authority: CN
Inventors: 许辉; 张浩楠; 向瑞睿; 向生建; 刘才广; 张勇雄; 刘成; 李云雪; 彭彬; 莫智文
Original assignee: Sichuan Xinlong Xinchuang Technology Co ltd
Current assignee: Sichuan Xinlong Xinchuang Technology Co ltd
Priority date: 2023-08-26
Filing date: 2023-08-26
Publication date: 2023-11-24

Abstract

The invention belongs to the technical field of intelligent education, in particular to an intelligent education big model based on a knowledge base, which comprises a big model training and fine tuning module, a voice recognition module, a text preprocessing module and an intelligent dialogue module, wherein the voice recognition module and the text preprocessing module cooperatively process user input, and the intelligent dialogue module takes the big model as a base, cooperates with the voice recognition and text preprocessing module, responds to a user request to conduct intelligent reasoning and replies based on prior information provided by a self-research knowledge base. Specifically, the voice recognition module is used for processing voice input of a user in a complex noise environment, and then the voice input is converted into text information by the text preprocessing module and is input into the subsequent intelligent dialogue module. According to the invention, the safety and controllability of the model are enhanced, the safety shielding design is carried out in the process of training corpus and outputting the model, the safety and privacy of educational users can be better protected, and the model is prevented from misleading students or generating harmful information.

Description

Intelligent education big model based on knowledge base

Technical Field

The invention belongs to the technical field of intelligent education, and particularly relates to an intelligent education big model based on a knowledge base.

Background

In the development of artificial intelligence, the creation and application of large-scale pre-trained models has achieved significant achievements, especially in terms of data extraction and decision support, however, despite the tremendous advances these models have achieved, significant problems and limitations remain in facing safety controllability and its application in the vertical industry (e.g., education).

Firstly, from the viewpoint of security and controllability, the development of large-scale models originates from abroad, and security risks may exist when the large-scale models are applied in domestic commercialization, particularly in scenes involving confidential information, moreover, training corpus of large models often lacks continuous learning adaptation to Chinese and domestic semantic environments, so that large models based on Chinese are relatively few, and are mainly applied to dialogue questions and answers at present in vertical application scenes such as intelligent education industry, but are still lacking in specific application scene research.

Second, existing pre-training model training data are mostly derived from the internet, and these data knowledge is not equivalent to the expertise of the education field, so when the existing large model faces the migration of the expertise of the education field such as physics, chemistry, english, etc., difficulties may occur, which may lead to misleading information being obtained by users in the education industry, for which an effective solution strategy or solution does not exist.

Accordingly, there is a need for a knowledge base-based intelligent educational large model that addresses the aforementioned problems.

Disclosure of Invention

Aiming at the situation, in order to overcome the defects of the prior art, the invention provides an intelligent education big model based on a knowledge base, which effectively solves the problems in the background art.

In order to achieve the above purpose, the present invention provides the following technical solutions: the intelligent education large model based on the knowledge base comprises a large model training and fine tuning module, a voice recognition module, a text preprocessing module and an intelligent dialogue module, wherein the voice recognition module and the text preprocessing module cooperatively process user input, and the intelligent dialogue module takes the large model as a base and responds to a user request by the voice recognition and text preprocessing module cooperatively to conduct intelligent reasoning and replying. Specifically, the voice recognition module is used for processing voice input of a user in a complex noise environment, then the voice input is converted into text information by the text preprocessing module and is input into the subsequent intelligent dialogue module, and the text information is submitted to the large model for reasoning prediction and replying. Meanwhile, the safety and the controllability of the model are enhanced, the safety shielding design is carried out on the training corpus and the output post-processing of the model, the safety and the privacy of educational users can be better protected, and the model is prevented from misleading students or generating harmful information.

And a voice recognition module: the module is mainly used for processing the voice input of a user in a complex noisy environment, and is responsible for a series of audio preprocessing works such as voice denoising, and the semi-transcribed voice signal is sent to a subsequent text preprocessing module for further voice text conversion flow. The design and the workflow of the voice recognition module are as follows:

s1, fine tuning a voice recognition module by using an audio data set containing various background sounds/noises, so that the robustness of the voice recognition module is improved;

s2, collecting pronunciation of a user for a specified text, determining pronunciation accuracy and pronunciation preference of the user, and further fine-tuning a voice recognition module to improve recognition accuracy;

s3, in the use process of the system, preprocessing steps such as noise reduction and correction are carried out on the audio input by the user, and the audio is output to a subsequent module;

text preprocessing module: the module is used for receiving the data from the voice recognition module and completing the tasks of voice transcription, word segmentation, text vectorization and the like. The design and workflow of the text preprocessing module are as follows:

s1, performing transcription work of converting voice into text for output from a voice recognition module; aiming at the converted text, word segmentation is carried out, and the converted text is segmented into a sequence of words or subwords;

s2, analyzing whether words in sentences contain non-learning related words and illegal contents, if so, sending corresponding prompt information to remind a user, and requiring the user to input or reject a response again;

s3, labeling the parts of speech of each vocabulary;

s4, identifying and marking named entities in the text;

s5, dividing the text into independent sentences to be used as input of the intelligent dialogue module;

intelligent dialogue module of collaborative self-research knowledge base: the module aims at further fine tuning by taking a domestic large model as a base, and is cooperated with a voice recognition and text preprocessing module to respond to a user request based on prior information provided by a self-research knowledge base to perform intelligent reasoning and reply. The workflow of the intelligent dialogue module is as follows:

s1, performing fine adjustment on high-quality education data by using a domestic Chinese base model based on a machine learning and deep learning algorithm, so that the system has the capabilities of grammar error recognition, confusion word correction and the like, is more suitable for education scenes, and obtains a special large model facing education;

s2, a user sets preference scenes of voice exercise, such as business dialogue, english speech, daily dialogue and the like, so that a using field Jing Weidiao of the intelligent dialogue module is realized;

s3, updating and maintaining a self-research knowledge base, and timely updating and expanding knowledge reserves of the model;

s4, generating a response text required by a user based on the background knowledge of the trimmed language big model, the scene self-adaptive prompt words and the knowledge text with high timeliness;

s5, generating voice corresponding to the response text by using the voice conversion sub-module in the intelligent dialogue module, and feeding back the voice to the user.

Further, the machine learning and deep learning algorithm is used for fine tuning S1 on high-quality educational data by using a domestic Chinese base model, and the working flow is as follows:

s1-1, constructing a tens of millions of scale fine tuning data set from Internet data by using a crawler and other technologies, performing supervised training on a base model, and enhancing the response capability and reasoning performance of the base model to education-related knowledge;

s1-2, aiming at the education industry, pertinently collecting high-quality data with strong correlation to carry out further parameter fine adjustment, and enhancing knowledge understanding of the model in the field of refinement profession;

s1-3, based on feedback of the professional education practitioner on the output content of the model, the model learns the language style of the professional and the analysis process of the specific problem, and the output professionality of the model is enhanced.

Further, the knowledge base is updated and maintained, and the knowledge reserve S3 of the model is updated and expanded in time, and the workflow is as follows:

s3-1, capturing and updating relevant document knowledge of each subject in a database, such as teaching materials, teaching plans, text information in video media and the like, from the Internet in real time based on a data collection sub-module to form a massive and timely database;

s3-2, classifying and partitioning according to knowledge features based on a data partitioning sub-module to construct a vector database for model analysis and use;

s3-3, based on the user query vectorization module, projecting the structured text information to an embedded space, extracting the feature vector of the text, and acquiring and returning the most relevant feature vector from a database by combining a vector retrieval technology, thereby realizing the rapid query of knowledge.

Compared with the prior art, the invention has the beneficial effects that:

1. according to the invention, through the special large model for self-research education, the real data of tens of millions of scales in the industry are collected, and the efficient intelligent technical deployment and the floor application of the education industry are completed.

2. The invention has the application expansion capability, and can realize the migration of the technical application from English to more education subjects such as Chinese, mathematics, physics, chemistry and the like by enhancing the proprietary training data of the model, thereby realizing wider social benefits.

3. According to the invention, the safety and controllability of the model are enhanced, the safety shielding design is carried out in the process of training corpus and outputting the model, the safety and privacy of educational users can be better protected, and the model is prevented from misleading students or generating harmful information.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:

FIG. 1 is a schematic block diagram of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention; all other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The first embodiment of the invention is given by fig. 1, and discloses an intelligent education big model based on a knowledge base, which comprises a big model training and fine tuning module, a voice recognition module, a text preprocessing module and an intelligent dialogue module, wherein the voice recognition module and the text preprocessing module cooperatively process user input, and the intelligent dialogue module takes the big model as a base and responds to a user request by the voice recognition and text preprocessing module cooperatively to conduct intelligent reasoning and reply. Specifically, the voice recognition module is used for processing voice input of a user in a complex noise environment, then the voice input is converted into text information by the text preprocessing module and is input into the subsequent intelligent dialogue module, and the text information is submitted to the large model for reasoning prediction and replying. Meanwhile, the safety and the controllability of the model are enhanced, the safety shielding design is carried out on the training corpus and the output post-processing of the model, the safety and the privacy of educational users can be better protected, and the model is prevented from misleading students or generating harmful information.

s3, labeling the parts of speech of each vocabulary;

s4, identifying and marking named entities in the text;

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. The utility model provides an intelligent education big model based on knowledge base, includes speech recognition module, text processing module and intelligent dialogue module, its characterized in that: the intelligent dialogue module takes a large model as a base, and the intelligent dialogue module is cooperated with the voice recognition and text preprocessing module to perform intelligent reasoning and replying based on prior information provided by a self-research knowledge base;

and a voice recognition module: the module is mainly used for processing voice input of a user in a complex noisy environment, and is responsible for a series of audio preprocessing works such as voice denoising, and a semi-transcribed voice signal of the module is sent to a subsequent text preprocessing module for further voice text conversion flow; the design and the workflow of the voice recognition module are as follows:

text preprocessing module: the module is used for receiving the data from the voice recognition module and completing the tasks of voice transcription, word segmentation, text vectorization and the like; the design and workflow of the text preprocessing module are as follows:

s3, labeling the parts of speech of each vocabulary;

s4, identifying and marking named entities in the text;

intelligent dialogue module of collaborative self-research knowledge base: the module aims at further fine tuning by taking a domestic large model as a base, and is cooperated with a voice recognition and text preprocessing module to respond to a user request based on prior information provided by a self-research knowledge base to perform intelligent reasoning and reply; the workflow of the intelligent dialogue module is as follows:

2. The knowledge base based intelligent educational large model according to claim 1, wherein: fine tuning S1 on high quality educational data using the domestic chinese base model, the workflow is as follows:

3. The knowledge base based intelligent educational large model according to claim 1, wherein: updating and maintaining a self-research knowledge base, and timely updating and expanding knowledge reserves S3 of a model, wherein the working flow is as follows: