CN109726392B - Intelligent language cognition information processing system and method based on big data - Google Patents

Intelligent language cognition information processing system and method based on big data Download PDF

Info

Publication number
CN109726392B
CN109726392B CN201811521939.7A CN201811521939A CN109726392B CN 109726392 B CN109726392 B CN 109726392B CN 201811521939 A CN201811521939 A CN 201811521939A CN 109726392 B CN109726392 B CN 109726392B
Authority
CN
China
Prior art keywords
module
verification
data
language
big data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811521939.7A
Other languages
Chinese (zh)
Other versions
CN109726392A (en
Inventor
尹观海
方燕红
王文烨
李小东
陈佳
张明宝
廖玲萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinggangshan University
Original Assignee
Jinggangshan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinggangshan University filed Critical Jinggangshan University
Priority to CN201811521939.7A priority Critical patent/CN109726392B/en
Publication of CN109726392A publication Critical patent/CN109726392A/en
Application granted granted Critical
Publication of CN109726392B publication Critical patent/CN109726392B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention belongs to the field of big data, and discloses an intelligent language cognitive information processing system and method based on big data; inputting the language through the voice and text input form; extracting words and phrases by using the best consistent approximation method for the words and phrases, and the Hange and the sentence pattern, and converting the words and phrases after extracting the words and phrases; checking the conversion content and sentence existing in the system by adopting a Showy-Fresnel algorithm, and verifying the conversion content; after verification, inputting the verification result into a microprocessor; re-extracting and converting after verification failure, and entering into a microprocessor after passing the verification failure; and finally, the information is stored by adopting wavelet domain denoising of PURE-LET and is output through a loudspeaker. The invention can greatly reduce the error rate of the intelligent language cognitive system, can perform multi-language conversion, and can improve the conversion efficiency through the memory function.

Description

Intelligent language cognition information processing system and method based on big data
Technical Field
The invention belongs to the field of big data, and particularly relates to an intelligent language cognitive information processing system and method based on big data.
Background
Language is broadly speaking a set of communication instructions that are expressed using common processing rules, the instructions being communicated visually, audibly, or tactilely. Strictly speaking, language refers to instruction-natural language used for human communication. All people are language abilities obtained through learning, and the purpose of the language is to communicate ideas, ideas and the like. Linguistics have evolved from human research into linguistic classification and rules. Language is a way of communication between people, and people can not leave the language in contact with each other. Although people's ideas can be conveyed by pictures, actions, expressions, etc., language is among the most important and most convenient medium. When humans find that certain animals can communicate in some way, the concept of animal language is created. To the birth of a computer, a human needs to give instructions to the computer. The one-way communication becomes a computer language. However, the computer can not be well recognized when directly understanding the language spoken by the human, and the computer has high error rate in intelligent language recognition at present, and a plurality of words can not be recognized, so that the computer can only be used for simple and single recognition.
In summary, the problems of the prior art are:
at present, the computer has high error rate in intelligent language cognition, and a plurality of words cannot be identified, so that only simple and single identification can be performed.
In the prior art, the words cannot be accurately extracted; in the prior art, the conversion content cannot effectively remove errors or redundant information, so that the verification time is prolonged, the verification efficiency is reduced, and the efficient verification of the conversion content cannot be realized; in the prior art, the information is easy to be interfered by external factors, the information quality is reduced, errors are caused, and the accurate output of the loudspeaker is not facilitated.
Disclosure of Invention
Aiming at the problems existing in the prior art, the invention provides an intelligent language cognitive information processing system and method based on big data.
The invention is realized in such a way that the intelligent language cognition information processing method based on big data comprises the following steps:
firstly, inputting a language through a voice and text input mode;
secondly, extracting the words and phrases by using the best consistent approximation method for the words and phrases, the guard and the sentence periods, and converting the extracted words and phrases;
thirdly, checking the converted content and sentence meaning existing in the system by adopting a Showy-Fresnel algorithm, and verifying the converted content;
fourth, inputting the verification result to the microprocessor; re-extracting and converting after verification failure, and entering into a microprocessor after passing the verification failure; and finally, the information is stored by adopting wavelet domain denoising of PURE-LET and is output through a loudspeaker.
Further, in the second step, the words, idioms, the shoddy and the sentence periods are used for extracting the words by adopting an optimal consistent approximation method for the language, and the specific algorithm is as follows: f (x) ∈C [ a, b ]],p n (x) Is a set of all polynomials with degree not exceeding n; if:
then p x is the best consistent approximation polynomial of f (x) over a, b, also called the minima maxima polynomial;
solving an optimal polynomial by adopting a lining Mi Ci algorithm; solving according to chebyshev's theorem:
wherein: ak (k=0, 1, … n) is the polynomial coefficient to be solved; ρ is the best approximation; x is x i Obtained by using an iterative correction method.
Further, in the third step, the converted content is checked by adopting a Showy-Fresnel algorithm, so that the efficient verification of the converted content is realized; the algorithm comprises the following steps:
using a set of data samples S 0 ={x 0 ,x 1 ,…,x n N sample data contains m error data sample points, f 0 (x) Is reflecting the set of data samplesThe function of this basic feature is as follows:
wherein: n is the number of individuals for a set of data;
D i =|x i -f(x i )|;
for measuring sample point data x i Degree of deviation from functional relationship D i The larger the sample point is, the greater the likelihood of the sample point being error data; d for n data i A maximum value;
chinese zodiac-View Fresnel algorithm rejection D i The sample point j with the largest value is used for establishing a new sample set S 1 ={S 0 –x j And (3) repeating operation on the rest data, wherein when the data meets the operation termination condition, the m removed sample points are error data.
Another object of the present invention is to provide a big data based intelligent language cognitive information processing system implementing the big data based intelligent language cognitive information processing method, the big data based intelligent language cognitive information processing system comprising: the system comprises a language receiving module, a character input module, a word extraction module, a conversion module, a verification module, a microprocessor, a storage module, a loudspeaker module and big data;
the big data provides knowledge support for the word extraction module and the verification module; the voice receiving module and the text input module are input and then extracted through the word extraction module, the word extraction module is then used for converting, and the conversion module is used for inputting conversion content to the verification module;
the verification module inputs the verification result to the microprocessor after the verification is passed, and returns to the word extraction module for reconversion after the verification is failed;
the microprocessor stores the conversion information into the storage module; the microprocessor outputs the information through the speaker module.
The invention further aims to provide a spare element cognition platform applying the intelligent language cognition information processing method based on big data.
The invention has the advantages and positive effects that: the verification module is arranged, and verifies the information output by the conversion module and the information in the big data, if the verification conversion is wrong, the extraction conversion is carried out again, so that the system can have correct cognition, and errors are avoided; the invention is provided with the storage module, and the storage module can record the converted language, so that the conversion system can generate memory, and the next conversion is more rapid. The invention is provided with big data, so that the vocabulary source of the system is wider, multiple languages can be identified, colloquial idioms and the like can be inquired, and the error rate is low. The error rate of the intelligent language cognition system can be greatly reduced, multiple languages can be converted, and the conversion efficiency can be improved through the memory function.
The invention utilizes words, idioms, a hank, a sentence pattern and the like to extract the words by adopting an optimal consistent approximation method for the language, thereby improving the accuracy of the word extraction; the invention adopts the Showy Fresnel algorithm to check the conversion content, effectively removes error or redundant information, improves the checking efficiency and realizes the efficient verification of the conversion content; the invention stores the wavelet domain denoising of the information by adopting the PURE-LET, effectively avoids the interference of external factors, ensures the information quality and is favorable for the accurate output of a loudspeaker.
Drawings
Fig. 1 is a flowchart of an intelligent language cognition information processing method based on big data provided by an embodiment of the invention.
FIG. 2 is a schematic diagram of an intelligent language cognitive information processing system based on big data according to an embodiment of the present invention;
in the figure: 1. a language receiving module; 2. a text input module; 3. a word extraction module; 4. a conversion module; 5. a verification module; 6. a microprocessor; 7. a storage module; 8. a speaker module; 9. big data.
Detailed Description
For further understanding of the invention, the following examples are set forth to illustrate the invention, its features and their efficacy, as best illustrated in the accompanying drawings, 1.
The structure of the present invention will be described in detail with reference to the accompanying drawings.
As shown in fig. 1, the intelligent language cognitive information processing method based on big data provided by the embodiment of the invention specifically includes the following steps:
s101: inputting the language through the voice and text input form;
s102: extracting words and phrases, a Hangul, a sentence pattern and the like by adopting an optimal consistent approximation method for the language, and converting after extracting the words and the phrases;
s103: checking the conversion content and sentence existing in the system by adopting a Showy-Fresnel algorithm, and verifying the conversion content;
s104: after verification, inputting the verification result into a microprocessor; re-extracting and converting after verification failure, and entering into a microprocessor after passing the verification failure; and finally, the information is stored by adopting wavelet domain denoising of PURE-LET and is output through a loudspeaker.
In step S102, the method for extracting the words and phrases by using the words and phrases, the Hangul, the sentence pattern and the like according to the embodiment of the invention adopts the best consistent approximation method for the language, thereby improving the accuracy of extracting the words and phrases; the specific algorithm is as follows:
let f (x) E C [ a, b ]],p n (x) Is a set of all polynomials with degree not exceeding n; if it is
Then p x is the best consistent approximation polynomial of f (x) over a, b, also called the minima maxima polynomial;
solving an optimal polynomial by adopting a lining Mi Ci algorithm; solving according to chebyshev's theorem
Wherein: ak (k=0, 1, … n) is the polynomial coefficient to be solved; ρ is the best approximation; x is x i Obtained by using an iterative correction method.
In step S103, the conversion content provided by the embodiment of the present invention is checked by using a schottky algorithm, so that errors or redundant information is effectively removed, the checking efficiency is improved, and efficient verification of the conversion content is realized; the algorithm comprises the following steps:
using a set of data samples S 0 ={x 0 ,x 1 ,…,x n N samples of data containing m error data samples
Point f 0 (x) Is a function reflecting the basic characteristics of the set of data samples as follows:
wherein: n is the number of individuals for a set of data;
D i =|x i -f(x i )|
for measuring sample point data x i Degree of deviation from functional relationship D i The larger the sample point is, the greater the likelihood of the sample point being error data; d for n data i A maximum value;
chinese zodiac-View Fresnel algorithm rejection D i The sample point j with the largest value is used for establishing a new sample set S 1 ={S 0 -x j And (3) repeating operation on the rest data, wherein when the data meets the operation termination condition, the m removed sample points are error data.
In step S103, the information is stored by adopting the wavelet domain denoising of PURE-LET, so that the interference of external factors is effectively avoided, the information quality is ensured, and the accurate output of a loudspeaker is facilitated; the specific algorithm is as follows:
information at each scale estimates wavelet coefficientsAll written as a linear combination of a set of basic threshold functions:
and the coefficient vector a= [ a ] is determined by minimization of the push 1 ,…,a M ] T
Let θ (d, s) =θ j (d i ,s j ) For noiseless wavelet coefficient delta=delta j Is a function of the estimate of (1); function theta + (d, s) and θ - (d, s) as follows:
wherein, the liquid crystal display device comprises a liquid crystal display device,is->Standard basis of (2) except e k (k) =0 for all other elements; random variable PURE j For unbiased estimation of MSE at subband j, i.e., E { PURE j }=E{MSE j };
Calculating a linear combination parameter of wavelet estimation in formula (2) through minimization of PURE; substituting the formula (2) into the formula (3) and omitting the independent variables (d, s) includes
As shown in fig. 2, the intelligent language cognitive information processing system based on big data provided in the embodiment of the present invention specifically includes:
the system comprises a language receiving module 1, a character input module 2, a word extraction module 3, a conversion module 4, a verification module 5, a microprocessor 6, a storage module 7, a loudspeaker module 8 and big data 9.
Big data 9 provides knowledge support for word extraction module 3 and verification module 4; the voice receiving module 1 and the text input module 2 are input and then extracted through the word extracting module 3, the word extracting module 3 is then converted, and the conversion module 4 inputs the conversion content to the verification module 5.
The verification module 5 provided by the embodiment of the invention inputs the verification to the microprocessor 6 after the verification is passed, and returns to the word extraction module 3 for reconversion after the verification is failed.
The microprocessor 6 provided in the embodiment of the invention stores the conversion information in the storage module 7.
The microprocessor 6 provided by the embodiment of the invention outputs information through the speaker module 8.
The working principle of the invention is as follows: through the input of the voice receiving module 1 and the text input module 2, the word extracting module 3 extracts words, idioms, the adams, sentence patterns and the like in big data 9, the words are converted through the conversion module 4 after being extracted, the converted contents are input into the verification module 5 for verification, the verification module 5 receives sentence patterns existing in the big data 9 for verification, the verification module 5 inputs the sentence patterns into the microprocessor 6 after verification, the word extracting module 3 returns to convert and extract the words after verification failure for reconversion, and the microprocessor 6 stores information into the storage module 7 and outputs the information through the loudspeaker module 8.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the invention in any way, but any simple modification, equivalent variation and modification of the above embodiments according to the technical principles of the present invention are within the scope of the technical solutions of the present invention.

Claims (3)

1. The intelligent language cognition information processing method based on big data is characterized by comprising the following steps of:
firstly, inputting a language through a voice and text input mode;
secondly, extracting the words and phrases by using the best consistent approximation method for the words and phrases, the guard and the sentence periods, and converting the extracted words and phrases;
thirdly, checking the converted content and sentence meaning existing in the system by adopting a Showy-Fresnel algorithm, and verifying the converted content;
fourth, inputting the verification result to the microprocessor; extracting and converting again after verification failure, and inputting the qualified result into a microprocessor; finally, the information is stored by adopting wavelet domain denoising of PURE-LET and is output through a loudspeaker;
in the second step, words, idioms, and a sentence pattern are used for extracting words by adopting an optimal consistent approximation method for languages, and the specific algorithm is as follows: f (x) ∈C [ a, b ]],p n (x) Is a set of all polynomials with degree not exceeding n; if:
then call p * (x) Is f (x) is represented by [ a, b ]]The best consistent approximation polynomial, also called minimisation maximum polynomial;
solving an optimal polynomial by adopting a lining Mi Ci algorithm; solving according to chebyshev's theorem:
wherein: a, a k (k=0, 1, … n) is the polynomial coefficient to be solved; ρ is the best approximation; x is x i Obtaining by using an iterative correction method;
in the third step, the converted content is checked by adopting a Showy-Fresnel algorithm, so that the efficient verification of the converted content is realized; the algorithm comprises the following steps:
using a set of data samples S 0 ={x 0 ,x 1 ,…,x n N sample data contains m error data sample points, f 0 (x) Is a function reflecting the basic characteristics of the set of data samples as follows:
wherein: n is the number of individuals for a set of data;
D i =|x i -f(x i )|;
for measuring sample point data x i Degree of deviation from functional relationship D i The larger the sample point is, the greater the likelihood of the sample point being error data; d for n data i A maximum value;
chinese zodiac-View Fresnel algorithm rejection D i The sample point j with the largest value is used for establishing a new sample set S 1 ={S 0 –x j And (3) repeating operation on the rest data, wherein when the data meets the operation termination condition, the m removed sample points are error data.
2. A big data-based intelligent language cognitive information processing system that implements the big data-based intelligent language cognitive information processing method of claim 1, characterized in that the big data-based intelligent language cognitive information processing system comprises: the system comprises a language receiving module, a character input module, a word extraction module, a conversion module, a verification module, a microprocessor, a storage module, a loudspeaker module and big data;
the big data provides knowledge support for the word extraction module and the verification module; the voice receiving module and the text input module are input and then extracted through the word extraction module, the word extraction module is then used for converting, and the conversion module is used for inputting conversion content to the verification module;
the verification module inputs the verification result to the microprocessor after the verification is passed, and returns to the word extraction module for reconversion after the verification is failed;
the microprocessor stores the conversion information into the storage module; the microprocessor outputs the information through the speaker module.
3. A language cognition platform applying the intelligent language cognition information processing method based on big data according to claim 1.
CN201811521939.7A 2018-12-13 2018-12-13 Intelligent language cognition information processing system and method based on big data Active CN109726392B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811521939.7A CN109726392B (en) 2018-12-13 2018-12-13 Intelligent language cognition information processing system and method based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811521939.7A CN109726392B (en) 2018-12-13 2018-12-13 Intelligent language cognition information processing system and method based on big data

Publications (2)

Publication Number Publication Date
CN109726392A CN109726392A (en) 2019-05-07
CN109726392B true CN109726392B (en) 2023-10-10

Family

ID=66294925

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811521939.7A Active CN109726392B (en) 2018-12-13 2018-12-13 Intelligent language cognition information processing system and method based on big data

Country Status (1)

Country Link
CN (1) CN109726392B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101221704A (en) * 2007-01-12 2008-07-16 戴献东 Electric language learning policy
CN101604204A (en) * 2009-07-09 2009-12-16 北京科技大学 Distributed cognitive technology for intelligent emotional robot
CN104778254A (en) * 2015-04-20 2015-07-15 北京蓝色光标品牌管理顾问股份有限公司 Distributing type system for non-parameter topic automatic identifying and identifying method
CN105494230A (en) * 2015-09-30 2016-04-20 常州大学怀德学院 Intelligent orientating oxygenation method and apparatus for aquatic culture
CN107123068A (en) * 2017-04-26 2017-09-01 北京航空航天大学 A kind of programming-oriented language course individualized learning effect analysis system and method
CN107273361A (en) * 2017-06-21 2017-10-20 河南工业大学 The word computational methods and its device closed based on the general type-2 fuzzy sets of broad sense
CN107741295A (en) * 2017-09-15 2018-02-27 江苏大学 A kind of MENS capacitive baroceptors test calibration device and method
CN207541938U (en) * 2017-11-08 2018-06-26 延边大学 A kind of natural language intelligent interaction machine
CN108537332A (en) * 2018-04-12 2018-09-14 合肥工业大学 A kind of Sigmoid function hardware-efficient rate implementation methods based on Remez algorithms
CN111597790A (en) * 2020-05-25 2020-08-28 郑州轻工业大学 Natural language processing system based on artificial intelligence

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9002702B2 (en) * 2012-05-03 2015-04-07 International Business Machines Corporation Confidence level assignment to information from audio transcriptions

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101221704A (en) * 2007-01-12 2008-07-16 戴献东 Electric language learning policy
CN101604204A (en) * 2009-07-09 2009-12-16 北京科技大学 Distributed cognitive technology for intelligent emotional robot
CN104778254A (en) * 2015-04-20 2015-07-15 北京蓝色光标品牌管理顾问股份有限公司 Distributing type system for non-parameter topic automatic identifying and identifying method
CN105494230A (en) * 2015-09-30 2016-04-20 常州大学怀德学院 Intelligent orientating oxygenation method and apparatus for aquatic culture
CN107123068A (en) * 2017-04-26 2017-09-01 北京航空航天大学 A kind of programming-oriented language course individualized learning effect analysis system and method
CN107273361A (en) * 2017-06-21 2017-10-20 河南工业大学 The word computational methods and its device closed based on the general type-2 fuzzy sets of broad sense
CN107741295A (en) * 2017-09-15 2018-02-27 江苏大学 A kind of MENS capacitive baroceptors test calibration device and method
CN207541938U (en) * 2017-11-08 2018-06-26 延边大学 A kind of natural language intelligent interaction machine
CN108537332A (en) * 2018-04-12 2018-09-14 合肥工业大学 A kind of Sigmoid function hardware-efficient rate implementation methods based on Remez algorithms
CN111597790A (en) * 2020-05-25 2020-08-28 郑州轻工业大学 Natural language processing system based on artificial intelligence

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Austin F. Frank等.Speaking Rationally:Uniform Information Density as an Optimal Strategy for Language Production.《Proceedings of the Annual Meeting of the Cognitive Science Society》.2008,939-944. *
吴晶等.计算机辅助模式下外语自主学习者的认知.《现代教育技术》.2008,第第18卷卷(第第18卷期),37-41. *

Also Published As

Publication number Publication date
CN109726392A (en) 2019-05-07

Similar Documents

Publication Publication Date Title
CN111128394B (en) Medical text semantic recognition method and device, electronic equipment and readable storage medium
WO2021000408A1 (en) Interview scoring method and apparatus, and device and storage medium
CN111401084B (en) Method and device for machine translation and computer readable storage medium
US11526663B2 (en) Methods, apparatuses, devices, and computer-readable storage media for determining category of entity
JP6832501B2 (en) Meaning generation method, meaning generation device and program
CN111143530B (en) Intelligent reply method and device
CN116629275B (en) Intelligent decision support system and method based on big data
CN106997342B (en) Intention identification method and device based on multi-round interaction
CN110765785A (en) Neural network-based Chinese-English translation method and related equipment thereof
CN110717021A (en) Input text and related device for obtaining artificial intelligence interview
CN111126084A (en) Data processing method and device, electronic equipment and storage medium
CN111723583B (en) Statement processing method, device, equipment and storage medium based on intention role
CN113705207A (en) Grammar error recognition method and device
CN116364072B (en) Education information supervision method based on artificial intelligence
CN109726392B (en) Intelligent language cognition information processing system and method based on big data
CN110929532B (en) Data processing method, device, equipment and storage medium
WO2023116572A1 (en) Word or sentence generation method and related device
CN116704066A (en) Training method, training device, training terminal and training storage medium for image generation model
CN115730590A (en) Intention recognition method and related equipment
CN114239559B (en) Text error correction and text error correction model generation method, device, equipment and medium
Hladek et al. Unsupervised spelling correction for Slovak
CN115292492A (en) Method, device and equipment for training intention classification model and storage medium
CN115858776A (en) Variant text classification recognition method, system, storage medium and electronic equipment
CN111538814B (en) Method for supporting custom standardization by protocol in semantic understanding
CN114186020A (en) Semantic association method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant