CN109726392B - Intelligent language cognition information processing system and method based on big data - Google Patents
Intelligent language cognition information processing system and method based on big data Download PDFInfo
- Publication number
- CN109726392B CN109726392B CN201811521939.7A CN201811521939A CN109726392B CN 109726392 B CN109726392 B CN 109726392B CN 201811521939 A CN201811521939 A CN 201811521939A CN 109726392 B CN109726392 B CN 109726392B
- Authority
- CN
- China
- Prior art keywords
- module
- verification
- data
- language
- big data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention belongs to the field of big data, and discloses an intelligent language cognitive information processing system and method based on big data; inputting the language through the voice and text input form; extracting words and phrases by using the best consistent approximation method for the words and phrases, and the Hange and the sentence pattern, and converting the words and phrases after extracting the words and phrases; checking the conversion content and sentence existing in the system by adopting a Showy-Fresnel algorithm, and verifying the conversion content; after verification, inputting the verification result into a microprocessor; re-extracting and converting after verification failure, and entering into a microprocessor after passing the verification failure; and finally, the information is stored by adopting wavelet domain denoising of PURE-LET and is output through a loudspeaker. The invention can greatly reduce the error rate of the intelligent language cognitive system, can perform multi-language conversion, and can improve the conversion efficiency through the memory function.
Description
Technical Field
The invention belongs to the field of big data, and particularly relates to an intelligent language cognitive information processing system and method based on big data.
Background
Language is broadly speaking a set of communication instructions that are expressed using common processing rules, the instructions being communicated visually, audibly, or tactilely. Strictly speaking, language refers to instruction-natural language used for human communication. All people are language abilities obtained through learning, and the purpose of the language is to communicate ideas, ideas and the like. Linguistics have evolved from human research into linguistic classification and rules. Language is a way of communication between people, and people can not leave the language in contact with each other. Although people's ideas can be conveyed by pictures, actions, expressions, etc., language is among the most important and most convenient medium. When humans find that certain animals can communicate in some way, the concept of animal language is created. To the birth of a computer, a human needs to give instructions to the computer. The one-way communication becomes a computer language. However, the computer can not be well recognized when directly understanding the language spoken by the human, and the computer has high error rate in intelligent language recognition at present, and a plurality of words can not be recognized, so that the computer can only be used for simple and single recognition.
In summary, the problems of the prior art are:
at present, the computer has high error rate in intelligent language cognition, and a plurality of words cannot be identified, so that only simple and single identification can be performed.
In the prior art, the words cannot be accurately extracted; in the prior art, the conversion content cannot effectively remove errors or redundant information, so that the verification time is prolonged, the verification efficiency is reduced, and the efficient verification of the conversion content cannot be realized; in the prior art, the information is easy to be interfered by external factors, the information quality is reduced, errors are caused, and the accurate output of the loudspeaker is not facilitated.
Disclosure of Invention
Aiming at the problems existing in the prior art, the invention provides an intelligent language cognitive information processing system and method based on big data.
The invention is realized in such a way that the intelligent language cognition information processing method based on big data comprises the following steps:
firstly, inputting a language through a voice and text input mode;
secondly, extracting the words and phrases by using the best consistent approximation method for the words and phrases, the guard and the sentence periods, and converting the extracted words and phrases;
thirdly, checking the converted content and sentence meaning existing in the system by adopting a Showy-Fresnel algorithm, and verifying the converted content;
fourth, inputting the verification result to the microprocessor; re-extracting and converting after verification failure, and entering into a microprocessor after passing the verification failure; and finally, the information is stored by adopting wavelet domain denoising of PURE-LET and is output through a loudspeaker.
Further, in the second step, the words, idioms, the shoddy and the sentence periods are used for extracting the words by adopting an optimal consistent approximation method for the language, and the specific algorithm is as follows: f (x) ∈C [ a, b ]],p n (x) Is a set of all polynomials with degree not exceeding n; if:
then p x is the best consistent approximation polynomial of f (x) over a, b, also called the minima maxima polynomial;
solving an optimal polynomial by adopting a lining Mi Ci algorithm; solving according to chebyshev's theorem:
wherein: ak (k=0, 1, … n) is the polynomial coefficient to be solved; ρ is the best approximation; x is x i Obtained by using an iterative correction method.
Further, in the third step, the converted content is checked by adopting a Showy-Fresnel algorithm, so that the efficient verification of the converted content is realized; the algorithm comprises the following steps:
using a set of data samples S 0 ={x 0 ,x 1 ,…,x n N sample data contains m error data sample points, f 0 (x) Is reflecting the set of data samplesThe function of this basic feature is as follows:
wherein: n is the number of individuals for a set of data;
D i =|x i -f(x i )|;
for measuring sample point data x i Degree of deviation from functional relationship D i The larger the sample point is, the greater the likelihood of the sample point being error data; d for n data i A maximum value;
chinese zodiac-View Fresnel algorithm rejection D i The sample point j with the largest value is used for establishing a new sample set S 1 ={S 0 –x j And (3) repeating operation on the rest data, wherein when the data meets the operation termination condition, the m removed sample points are error data.
Another object of the present invention is to provide a big data based intelligent language cognitive information processing system implementing the big data based intelligent language cognitive information processing method, the big data based intelligent language cognitive information processing system comprising: the system comprises a language receiving module, a character input module, a word extraction module, a conversion module, a verification module, a microprocessor, a storage module, a loudspeaker module and big data;
the big data provides knowledge support for the word extraction module and the verification module; the voice receiving module and the text input module are input and then extracted through the word extraction module, the word extraction module is then used for converting, and the conversion module is used for inputting conversion content to the verification module;
the verification module inputs the verification result to the microprocessor after the verification is passed, and returns to the word extraction module for reconversion after the verification is failed;
the microprocessor stores the conversion information into the storage module; the microprocessor outputs the information through the speaker module.
The invention further aims to provide a spare element cognition platform applying the intelligent language cognition information processing method based on big data.
The invention has the advantages and positive effects that: the verification module is arranged, and verifies the information output by the conversion module and the information in the big data, if the verification conversion is wrong, the extraction conversion is carried out again, so that the system can have correct cognition, and errors are avoided; the invention is provided with the storage module, and the storage module can record the converted language, so that the conversion system can generate memory, and the next conversion is more rapid. The invention is provided with big data, so that the vocabulary source of the system is wider, multiple languages can be identified, colloquial idioms and the like can be inquired, and the error rate is low. The error rate of the intelligent language cognition system can be greatly reduced, multiple languages can be converted, and the conversion efficiency can be improved through the memory function.
The invention utilizes words, idioms, a hank, a sentence pattern and the like to extract the words by adopting an optimal consistent approximation method for the language, thereby improving the accuracy of the word extraction; the invention adopts the Showy Fresnel algorithm to check the conversion content, effectively removes error or redundant information, improves the checking efficiency and realizes the efficient verification of the conversion content; the invention stores the wavelet domain denoising of the information by adopting the PURE-LET, effectively avoids the interference of external factors, ensures the information quality and is favorable for the accurate output of a loudspeaker.
Drawings
Fig. 1 is a flowchart of an intelligent language cognition information processing method based on big data provided by an embodiment of the invention.
FIG. 2 is a schematic diagram of an intelligent language cognitive information processing system based on big data according to an embodiment of the present invention;
in the figure: 1. a language receiving module; 2. a text input module; 3. a word extraction module; 4. a conversion module; 5. a verification module; 6. a microprocessor; 7. a storage module; 8. a speaker module; 9. big data.
Detailed Description
For further understanding of the invention, the following examples are set forth to illustrate the invention, its features and their efficacy, as best illustrated in the accompanying drawings, 1.
The structure of the present invention will be described in detail with reference to the accompanying drawings.
As shown in fig. 1, the intelligent language cognitive information processing method based on big data provided by the embodiment of the invention specifically includes the following steps:
s101: inputting the language through the voice and text input form;
s102: extracting words and phrases, a Hangul, a sentence pattern and the like by adopting an optimal consistent approximation method for the language, and converting after extracting the words and the phrases;
s103: checking the conversion content and sentence existing in the system by adopting a Showy-Fresnel algorithm, and verifying the conversion content;
s104: after verification, inputting the verification result into a microprocessor; re-extracting and converting after verification failure, and entering into a microprocessor after passing the verification failure; and finally, the information is stored by adopting wavelet domain denoising of PURE-LET and is output through a loudspeaker.
In step S102, the method for extracting the words and phrases by using the words and phrases, the Hangul, the sentence pattern and the like according to the embodiment of the invention adopts the best consistent approximation method for the language, thereby improving the accuracy of extracting the words and phrases; the specific algorithm is as follows:
let f (x) E C [ a, b ]],p n (x) Is a set of all polynomials with degree not exceeding n; if it is
Then p x is the best consistent approximation polynomial of f (x) over a, b, also called the minima maxima polynomial;
solving an optimal polynomial by adopting a lining Mi Ci algorithm; solving according to chebyshev's theorem
Wherein: ak (k=0, 1, … n) is the polynomial coefficient to be solved; ρ is the best approximation; x is x i Obtained by using an iterative correction method.
In step S103, the conversion content provided by the embodiment of the present invention is checked by using a schottky algorithm, so that errors or redundant information is effectively removed, the checking efficiency is improved, and efficient verification of the conversion content is realized; the algorithm comprises the following steps:
using a set of data samples S 0 ={x 0 ,x 1 ,…,x n N samples of data containing m error data samples
Point f 0 (x) Is a function reflecting the basic characteristics of the set of data samples as follows:
wherein: n is the number of individuals for a set of data;
D i =|x i -f(x i )|
for measuring sample point data x i Degree of deviation from functional relationship D i The larger the sample point is, the greater the likelihood of the sample point being error data; d for n data i A maximum value;
chinese zodiac-View Fresnel algorithm rejection D i The sample point j with the largest value is used for establishing a new sample set S 1 ={S 0 -x j And (3) repeating operation on the rest data, wherein when the data meets the operation termination condition, the m removed sample points are error data.
In step S103, the information is stored by adopting the wavelet domain denoising of PURE-LET, so that the interference of external factors is effectively avoided, the information quality is ensured, and the accurate output of a loudspeaker is facilitated; the specific algorithm is as follows:
information at each scale estimates wavelet coefficientsAll written as a linear combination of a set of basic threshold functions:
and the coefficient vector a= [ a ] is determined by minimization of the push 1 ,…,a M ] T ;
Let θ (d, s) =θ j (d i ,s j ) For noiseless wavelet coefficient delta=delta j Is a function of the estimate of (1); function theta + (d, s) and θ - (d, s) as follows:
wherein, the liquid crystal display device comprises a liquid crystal display device,is->Standard basis of (2) except e k (k) =0 for all other elements; random variable PURE j For unbiased estimation of MSE at subband j, i.e., E { PURE j }=E{MSE j };
Calculating a linear combination parameter of wavelet estimation in formula (2) through minimization of PURE; substituting the formula (2) into the formula (3) and omitting the independent variables (d, s) includes
As shown in fig. 2, the intelligent language cognitive information processing system based on big data provided in the embodiment of the present invention specifically includes:
the system comprises a language receiving module 1, a character input module 2, a word extraction module 3, a conversion module 4, a verification module 5, a microprocessor 6, a storage module 7, a loudspeaker module 8 and big data 9.
Big data 9 provides knowledge support for word extraction module 3 and verification module 4; the voice receiving module 1 and the text input module 2 are input and then extracted through the word extracting module 3, the word extracting module 3 is then converted, and the conversion module 4 inputs the conversion content to the verification module 5.
The verification module 5 provided by the embodiment of the invention inputs the verification to the microprocessor 6 after the verification is passed, and returns to the word extraction module 3 for reconversion after the verification is failed.
The microprocessor 6 provided in the embodiment of the invention stores the conversion information in the storage module 7.
The microprocessor 6 provided by the embodiment of the invention outputs information through the speaker module 8.
The working principle of the invention is as follows: through the input of the voice receiving module 1 and the text input module 2, the word extracting module 3 extracts words, idioms, the adams, sentence patterns and the like in big data 9, the words are converted through the conversion module 4 after being extracted, the converted contents are input into the verification module 5 for verification, the verification module 5 receives sentence patterns existing in the big data 9 for verification, the verification module 5 inputs the sentence patterns into the microprocessor 6 after verification, the word extracting module 3 returns to convert and extract the words after verification failure for reconversion, and the microprocessor 6 stores information into the storage module 7 and outputs the information through the loudspeaker module 8.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the invention in any way, but any simple modification, equivalent variation and modification of the above embodiments according to the technical principles of the present invention are within the scope of the technical solutions of the present invention.
Claims (3)
1. The intelligent language cognition information processing method based on big data is characterized by comprising the following steps of:
firstly, inputting a language through a voice and text input mode;
secondly, extracting the words and phrases by using the best consistent approximation method for the words and phrases, the guard and the sentence periods, and converting the extracted words and phrases;
thirdly, checking the converted content and sentence meaning existing in the system by adopting a Showy-Fresnel algorithm, and verifying the converted content;
fourth, inputting the verification result to the microprocessor; extracting and converting again after verification failure, and inputting the qualified result into a microprocessor; finally, the information is stored by adopting wavelet domain denoising of PURE-LET and is output through a loudspeaker;
in the second step, words, idioms, and a sentence pattern are used for extracting words by adopting an optimal consistent approximation method for languages, and the specific algorithm is as follows: f (x) ∈C [ a, b ]],p n (x) Is a set of all polynomials with degree not exceeding n; if:
then call p * (x) Is f (x) is represented by [ a, b ]]The best consistent approximation polynomial, also called minimisation maximum polynomial;
solving an optimal polynomial by adopting a lining Mi Ci algorithm; solving according to chebyshev's theorem:
wherein: a, a k (k=0, 1, … n) is the polynomial coefficient to be solved; ρ is the best approximation; x is x i Obtaining by using an iterative correction method;
in the third step, the converted content is checked by adopting a Showy-Fresnel algorithm, so that the efficient verification of the converted content is realized; the algorithm comprises the following steps:
using a set of data samples S 0 ={x 0 ,x 1 ,…,x n N sample data contains m error data sample points, f 0 (x) Is a function reflecting the basic characteristics of the set of data samples as follows:
wherein: n is the number of individuals for a set of data;
D i =|x i -f(x i )|;
for measuring sample point data x i Degree of deviation from functional relationship D i The larger the sample point is, the greater the likelihood of the sample point being error data; d for n data i A maximum value;
chinese zodiac-View Fresnel algorithm rejection D i The sample point j with the largest value is used for establishing a new sample set S 1 ={S 0 –x j And (3) repeating operation on the rest data, wherein when the data meets the operation termination condition, the m removed sample points are error data.
2. A big data-based intelligent language cognitive information processing system that implements the big data-based intelligent language cognitive information processing method of claim 1, characterized in that the big data-based intelligent language cognitive information processing system comprises: the system comprises a language receiving module, a character input module, a word extraction module, a conversion module, a verification module, a microprocessor, a storage module, a loudspeaker module and big data;
the big data provides knowledge support for the word extraction module and the verification module; the voice receiving module and the text input module are input and then extracted through the word extraction module, the word extraction module is then used for converting, and the conversion module is used for inputting conversion content to the verification module;
the verification module inputs the verification result to the microprocessor after the verification is passed, and returns to the word extraction module for reconversion after the verification is failed;
the microprocessor stores the conversion information into the storage module; the microprocessor outputs the information through the speaker module.
3. A language cognition platform applying the intelligent language cognition information processing method based on big data according to claim 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811521939.7A CN109726392B (en) | 2018-12-13 | 2018-12-13 | Intelligent language cognition information processing system and method based on big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811521939.7A CN109726392B (en) | 2018-12-13 | 2018-12-13 | Intelligent language cognition information processing system and method based on big data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109726392A CN109726392A (en) | 2019-05-07 |
CN109726392B true CN109726392B (en) | 2023-10-10 |
Family
ID=66294925
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811521939.7A Active CN109726392B (en) | 2018-12-13 | 2018-12-13 | Intelligent language cognition information processing system and method based on big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109726392B (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101221704A (en) * | 2007-01-12 | 2008-07-16 | 戴献东 | Electric language learning policy |
CN101604204A (en) * | 2009-07-09 | 2009-12-16 | 北京科技大学 | Distributed cognitive technology for intelligent emotional robot |
CN104778254A (en) * | 2015-04-20 | 2015-07-15 | 北京蓝色光标品牌管理顾问股份有限公司 | Distributing type system for non-parameter topic automatic identifying and identifying method |
CN105494230A (en) * | 2015-09-30 | 2016-04-20 | 常州大学怀德学院 | Intelligent orientating oxygenation method and apparatus for aquatic culture |
CN107123068A (en) * | 2017-04-26 | 2017-09-01 | 北京航空航天大学 | A kind of programming-oriented language course individualized learning effect analysis system and method |
CN107273361A (en) * | 2017-06-21 | 2017-10-20 | 河南工业大学 | The word computational methods and its device closed based on the general type-2 fuzzy sets of broad sense |
CN107741295A (en) * | 2017-09-15 | 2018-02-27 | 江苏大学 | A kind of MENS capacitive baroceptors test calibration device and method |
CN207541938U (en) * | 2017-11-08 | 2018-06-26 | 延边大学 | A kind of natural language intelligent interaction machine |
CN108537332A (en) * | 2018-04-12 | 2018-09-14 | 合肥工业大学 | A kind of Sigmoid function hardware-efficient rate implementation methods based on Remez algorithms |
CN111597790A (en) * | 2020-05-25 | 2020-08-28 | 郑州轻工业大学 | Natural language processing system based on artificial intelligence |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9002702B2 (en) * | 2012-05-03 | 2015-04-07 | International Business Machines Corporation | Confidence level assignment to information from audio transcriptions |
-
2018
- 2018-12-13 CN CN201811521939.7A patent/CN109726392B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101221704A (en) * | 2007-01-12 | 2008-07-16 | 戴献东 | Electric language learning policy |
CN101604204A (en) * | 2009-07-09 | 2009-12-16 | 北京科技大学 | Distributed cognitive technology for intelligent emotional robot |
CN104778254A (en) * | 2015-04-20 | 2015-07-15 | 北京蓝色光标品牌管理顾问股份有限公司 | Distributing type system for non-parameter topic automatic identifying and identifying method |
CN105494230A (en) * | 2015-09-30 | 2016-04-20 | 常州大学怀德学院 | Intelligent orientating oxygenation method and apparatus for aquatic culture |
CN107123068A (en) * | 2017-04-26 | 2017-09-01 | 北京航空航天大学 | A kind of programming-oriented language course individualized learning effect analysis system and method |
CN107273361A (en) * | 2017-06-21 | 2017-10-20 | 河南工业大学 | The word computational methods and its device closed based on the general type-2 fuzzy sets of broad sense |
CN107741295A (en) * | 2017-09-15 | 2018-02-27 | 江苏大学 | A kind of MENS capacitive baroceptors test calibration device and method |
CN207541938U (en) * | 2017-11-08 | 2018-06-26 | 延边大学 | A kind of natural language intelligent interaction machine |
CN108537332A (en) * | 2018-04-12 | 2018-09-14 | 合肥工业大学 | A kind of Sigmoid function hardware-efficient rate implementation methods based on Remez algorithms |
CN111597790A (en) * | 2020-05-25 | 2020-08-28 | 郑州轻工业大学 | Natural language processing system based on artificial intelligence |
Non-Patent Citations (2)
Title |
---|
Austin F. Frank等.Speaking Rationally:Uniform Information Density as an Optimal Strategy for Language Production.《Proceedings of the Annual Meeting of the Cognitive Science Society》.2008,939-944. * |
吴晶等.计算机辅助模式下外语自主学习者的认知.《现代教育技术》.2008,第第18卷卷(第第18卷期),37-41. * |
Also Published As
Publication number | Publication date |
---|---|
CN109726392A (en) | 2019-05-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111128394B (en) | Medical text semantic recognition method and device, electronic equipment and readable storage medium | |
WO2021000408A1 (en) | Interview scoring method and apparatus, and device and storage medium | |
CN111401084B (en) | Method and device for machine translation and computer readable storage medium | |
US11526663B2 (en) | Methods, apparatuses, devices, and computer-readable storage media for determining category of entity | |
JP6832501B2 (en) | Meaning generation method, meaning generation device and program | |
CN111143530B (en) | Intelligent reply method and device | |
CN116629275B (en) | Intelligent decision support system and method based on big data | |
CN106997342B (en) | Intention identification method and device based on multi-round interaction | |
CN110765785A (en) | Neural network-based Chinese-English translation method and related equipment thereof | |
CN110717021A (en) | Input text and related device for obtaining artificial intelligence interview | |
CN111126084A (en) | Data processing method and device, electronic equipment and storage medium | |
CN111723583B (en) | Statement processing method, device, equipment and storage medium based on intention role | |
CN113705207A (en) | Grammar error recognition method and device | |
CN116364072B (en) | Education information supervision method based on artificial intelligence | |
CN109726392B (en) | Intelligent language cognition information processing system and method based on big data | |
CN110929532B (en) | Data processing method, device, equipment and storage medium | |
WO2023116572A1 (en) | Word or sentence generation method and related device | |
CN116704066A (en) | Training method, training device, training terminal and training storage medium for image generation model | |
CN115730590A (en) | Intention recognition method and related equipment | |
CN114239559B (en) | Text error correction and text error correction model generation method, device, equipment and medium | |
Hladek et al. | Unsupervised spelling correction for Slovak | |
CN115292492A (en) | Method, device and equipment for training intention classification model and storage medium | |
CN115858776A (en) | Variant text classification recognition method, system, storage medium and electronic equipment | |
CN111538814B (en) | Method for supporting custom standardization by protocol in semantic understanding | |
CN114186020A (en) | Semantic association method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |