CN116720044B

CN116720044B - Intelligent cleaning method and system for conference record data

Info

Publication number: CN116720044B
Application number: CN202311004253.1A
Authority: CN
Inventors: 孙立彬
Original assignee: Nantong Huashidai Information Technology Co ltd
Current assignee: Nantong Huashidai Information Technology Co ltd
Priority date: 2023-08-10
Filing date: 2023-08-10
Publication date: 2023-11-17
Anticipated expiration: 2043-08-10
Also published as: CN116720044A

Abstract

The invention provides an intelligent cleaning method and system for conference record data, which relate to the technical field of data processing, and the method comprises the following steps: acquiring M conference topics in a currently completed conference and a completed conference record; obtaining a plurality of record data and a plurality of word sets; acquiring N abnormal word sets; obtaining N conference subjects, and analyzing and obtaining N abnormal coefficients; respectively judging whether the N abnormal coefficients are larger than a preset coefficient threshold value, and obtaining Q supplementary analysis result sets according to the judgment results; and the Q supplementary analysis results with highest occurrence frequency in the Q supplementary analysis result sets are adopted respectively to carry out supplementary treatment to obtain Q cleaning record data, so that the technical problems of insufficient data supplementary efficiency, timeliness and higher artificial dependence degree in the prior art are solved, the supplementary efficiency and accuracy of missing data are improved, and the technical effect of cleaning efficiency of meeting records is further improved.

Description

Intelligent cleaning method and system for conference record data

Technical Field

The invention relates to the technical field of data processing, in particular to an intelligent cleaning method and system for conference record data.

Background

In the process of meeting recording, due to reasons such as more meeting contents, incomplete recorded data often appears, for example, different recorded personnel record different personnel talk, and this can lead to the possibly lack of partial content of final meeting record, most in the prior art are through the manual work to supplement, but the manual work is long in time of using, and the supplement is untimely, leads to there is not enough in data supplement efficiency, in time and to the higher technical problem of artificial dependence degree.

Disclosure of Invention

The invention provides an intelligent cleaning method and system for conference recorded data, which are used for solving the technical problems of insufficient data supplementing efficiency, insufficient time and higher dependence on manpower in the prior art.

According to a first aspect of the present invention, there is provided an intelligent cleaning method for meeting record data, including: obtaining M conference subjects in a currently completed conference and a completed conference record, wherein M is an integer greater than 1; dividing the recorded data in the conference record to obtain a plurality of recorded data, and dividing words of the plurality of recorded data to obtain a plurality of word sets; performing part-of-speech analysis and missing analysis on the plurality of word sets, and acquiring N abnormal word sets when part-of-speech missing occurs in at least one word set, wherein N is an integer greater than 1; performing conference theme analysis on the words in the N abnormal word sets to obtain N corresponding conference themes, and analyzing and obtaining N abnormal coefficients of the N abnormal word sets; respectively judging whether the N abnormal coefficients are larger than a preset coefficient threshold value, respectively inputting Q abnormal word sets and Q conference subjects with judging results of no into a missing data supplementing model to obtain Q supplementing analysis result sets, wherein the number of the supplementing analysis results in each supplementing analysis result set is positively correlated with the Q abnormal coefficients, recording and alarming the N-Q abnormal word sets with judging results of yes, and Q is an integer which is larger than or equal to 0 and smaller than N; and respectively adopting the complementary analysis results with highest occurrence frequency in the Q complementary analysis result sets to carry out complementary processing on the Q abnormal word sets to obtain Q cleaning record data.

According to a second aspect of the present invention, there is provided an intelligent cleaning system for meeting record data, comprising: the conference record acquisition module is used for acquiring M conference subjects in a currently completed conference and completed conference records, wherein M is an integer greater than 1; the data dividing module is used for dividing the recorded data in the conference record to obtain a plurality of recorded data, and dividing words of the plurality of recorded data to obtain a plurality of word sets; the word set analysis module is used for performing part-of-speech analysis and missing analysis on the word sets, and acquiring N abnormal word sets when part-of-speech missing occurs in at least one word set, wherein N is an integer greater than 1; the conference theme analysis module is used for carrying out conference theme analysis on the words in the N abnormal word sets, obtaining corresponding N conference themes and analyzing and obtaining N abnormal coefficients of the N abnormal word sets; the abnormal coefficient judging module is used for respectively judging whether the N abnormal coefficients are larger than a preset coefficient threshold value, respectively inputting Q abnormal word sets and Q conference subjects which are judged as no into the missing data supplementing model to obtain Q supplementing analysis result sets, wherein the number of the supplementing analysis results in each supplementing analysis result set is positively correlated with the Q abnormal coefficients, recording and alarming the N-Q abnormal word sets which are judged as yes, and Q is an integer which is larger than or equal to 0 and smaller than or equal to N; and the supplementary processing module is used for carrying out supplementary processing on the Q abnormal word sets by adopting the supplementary analysis results with the highest occurrence frequency in the Q supplementary analysis result sets respectively to obtain Q cleaning record data.

According to the intelligent cleaning method for the conference record data, the record data in the conference record are divided to obtain a plurality of record data, the word division is carried out on the plurality of record data to obtain a plurality of word sets, word segmentation processing on the conference record is realized, the accuracy of data missing analysis is improved, and the accuracy of abnormal analysis is ensured. And further performing part-of-speech analysis and missing analysis on the plurality of word sets to obtain N abnormal word sets, so that the analysis on missing data is realized, N abnormal coefficients of the N abnormal word sets are obtained through analysis, and then the N abnormal coefficients are compared with a preset coefficient threshold value to determine the abnormal word sets which are intelligently supplemented and manually supplemented in the N abnormal word sets, thereby ensuring the accuracy of data missing supplementation and avoiding the technical effects of errors caused by important meeting records. When intelligent supplementation is performed, a plurality of missing data supplementation units in a missing data supplementation model are constructed based on the BP neural network, abnormal word sets are analyzed through the missing data supplementation units with different constructed data and performances, so that the output accuracy and convergence efficiency of the model are improved, the effect is improved, the supplementation processing is performed on the abnormal word sets by adopting the supplementation analysis result with the highest occurrence frequency in the supplementation analysis result set, the supplementation efficiency and accuracy of the missing data are improved, and the technical effect of meeting record cleaning efficiency is further improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the following brief description will be given of the drawings used in the description of the embodiments or the prior art, it being obvious that the drawings in the description below are only exemplary and that other drawings can be obtained from the drawings provided without the inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of an intelligent cleaning method for conference recording data according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of obtaining N anomaly coefficients in an embodiment of the present invention;

FIG. 3 is a flow chart of obtaining Q sets of complementary analysis results according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an intelligent cleaning system for meeting record data according to an embodiment of the present invention.

Reference numerals illustrate: the system comprises a conference record acquisition module 11, a data division module 12, a word set analysis module 13, a conference theme analysis module 14, an anomaly coefficient judgment module 15 and a supplement processing module 16.

Detailed Description

Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present invention are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In order to solve the technical problems of insufficient data supplementing efficiency, timeliness and high dependence on manpower in the prior art, the inventor obtains the intelligent cleaning method and system for the conference recorded data through creative labor.

Example 1

Fig. 1 is a diagram of an intelligent cleaning method for meeting record data according to an embodiment of the present invention, where the method includes:

step S100: obtaining M conference subjects in a currently completed conference and a completed conference record, wherein M is an integer greater than 1;

specifically, the conference theme includes projects planning, attendance questions, wage adjustment and other themes, one conference can simultaneously contain a plurality of themes, so that M conference themes can be obtained, for example, data acquisition can be prepared based on the conference, M is an integer greater than 1, and then completed conference records are obtained, the conference records refer to written materials containing basic conditions of the conference, reports, questions, speaking, resolution and other contents on the conference, which are recorded on site by personnel responsible for recording when the conference is in progress, nowadays, with the development of electronic office, recording personnel can directly conduct conference records through electronic equipment (such as a notebook computer), and a conference record file can be obtained, and the conference record file is used as the completed conference record and is uploaded and stored.

Step S200: dividing the recorded data in the conference record to obtain a plurality of recorded data, and dividing words of the plurality of recorded data to obtain a plurality of word sets;

the step S200 of the embodiment of the present invention includes:

step S210: dividing a plurality of record data in the conference record according to a preset dividing rule;

step S220: and performing word segmentation processing on the plurality of record data to obtain the plurality of word sets.

Specifically, the record data in the conference record is divided according to a preset division rule, so as to obtain a plurality of record data, and the preset division rule can be set by itself, for example, the division is performed according to rules such as periods, spaces, line feed and the like, so that a plurality of sentences are divided as a plurality of record data. Further word segmentation is carried out on the plurality of record data, colloquially speaking, word group segmentation is carried out on a sentence to obtain a plurality of segmented words, existing word segmentation software (such as jieba, THULAC and the like) can be utilized for processing the plurality of record data, before the processing, stop word deletion such as stop words like 'O' is needed, the influence of stop words on word segmentation accuracy is avoided, word segmentation processing is carried out after the stop words are deleted, and a foundation is laid for cleaning of follow-up conference data.

Step S300: performing part-of-speech analysis and missing analysis on the plurality of word sets, and acquiring N abnormal word sets when part-of-speech missing occurs in at least one word set, wherein N is an integer greater than 1;

the step S300 of the embodiment of the present invention includes:

step S310: inputting the word sets into a preset part-of-speech database based on the preset part-of-speech database to obtain a plurality of part-of-speech information sets, wherein the preset part-of-speech database comprises mapping relations between a plurality of sample words and a plurality of sample parts-of-speech;

step S320: judging whether at least one of a plurality of necessary parts of speech is absent in the part-of-speech information sets, if not, judging that the part-of-speech information sets are normal, if so, judging that the part-of-speech is absent, and taking the corresponding word set as the N abnormal word sets when the part-of-speech is absent in at least one part-of-speech information set.

Specifically, based on a preset part-of-speech database, the plurality of word sets are input into the preset part-of-speech database to obtain a plurality of part-of-speech information sets, wherein the preset part-of-speech database comprises a mapping relation between a plurality of sample words and a plurality of sample parts-of-speech, in short, the plurality of sample words and the plurality of sample parts-of-speech of the preset part-of-speech database can be obtained based on internet and big data, the plurality of sample words and the plurality of sample parts-of-speech have corresponding relations, one sample word may correspond to one or more parts-of-speech, for example, one word may be a noun or a verb. And inputting the plurality of word sets into the preset part-of-speech database for traversal matching to obtain sample words which are the same as the words in the plurality of word sets as a matching result, and obtaining sample parts-of-speech corresponding to the matching result based on a mapping relation to obtain a plurality of part-of-speech information sets, wherein one word set corresponds to one part-of-speech information set.

Further, whether at least one of a plurality of necessary parts of speech is absent from the part-of-speech information sets is determined, for example, if a sentence is complete, the sentence at least contains a subject, a predicate, and an object, the subject and the object are nouns, the predicate is a verb, that is, one part-of-speech information set contains at least one verb and two adjectives, if the number of verbs contained in any one part-of-speech information set of the part-of-speech information sets is 1 or more and the number of adjectives is 2 or more, the part-of-speech information set is determined to be normal, and if the number of verbs contained in any one part-of-speech information set of the part-of-speech information sets is less than 1 and the number of adjectives is less than 2, the part-of-speech information set is determined to be part-of speech absent. When at least one part-of-speech information set has part-of-speech deficiency, taking a word set corresponding to the part-of-speech information set with part-of-speech deficiency as the N abnormal word sets, wherein N is an integer larger than 1. By means of part-of-speech missing analysis, preliminary meeting contents are judged and positioned truly, and convenience is provided for subsequent meeting record alarming and supplementing.

Step S400: performing conference theme analysis on the words in the N abnormal word sets to obtain N corresponding conference themes, and analyzing and obtaining N abnormal coefficients of the N abnormal word sets;

as shown in fig. 2, step S400 in the embodiment of the present invention includes:

step S410: constructing a conference keyword database according to recorded data of a conference in historical time, wherein the conference keyword database comprises a plurality of sub-databases corresponding to a plurality of sample conference subjects;

step S420: traversing and searching the words in the N abnormal word sets in the plurality of sub-databases to obtain the occurrence times of each word in the plurality of sub-databases, and obtaining a plurality of occurrence times sets;

step S430: outputting conference subjects of the sub-database corresponding to the largest occurrence number in each occurrence number set to obtain N conference subjects;

step S440: acquiring the total number of times of the words in the N abnormal word sets in the sub-databases of the N conference topics, and acquiring N total times;

step S450: and calculating the ratio of each total frequency to the average value of the N total frequencies to obtain the N abnormal coefficients.

Specifically, words in recorded data of a meeting in a history time (such as the past month, half year and the like) are counted, and a meeting keyword database is constructed, wherein the meeting keyword database comprises a plurality of sub-databases corresponding to a plurality of sample meeting topics, and each sub-database comprises a plurality of keyword combinations of a plurality of meetings belonging to the same topic. And traversing and searching the words in the N abnormal word sets in the plurality of sub-databases to obtain the occurrence times of each word in the plurality of sub-databases, obtaining a plurality of occurrence times sets, wherein each occurrence times set comprises the occurrence times of a word in the plurality of sub-databases, the association degree of the word with the corresponding sub-database with the larger occurrence times in the sub-databases is larger, and outputting the conference subject of the sub-database corresponding to the largest occurrence times in each occurrence times set according to the corresponding relation between the plurality of sample conference subjects in the conference keyword database and the plurality of sub-databases based on the association degree of the word with the corresponding sub-database, wherein the output result is the N conference subjects.

Further, colloquially speaking, the total times are the total times of the occurrence of the words in each abnormal word set in the sub-databases taking the N abnormal word sets, the total times of the occurrence in the N sub-databases corresponding to the total times are obtained, the average value of the N total times is calculated and obtained, the ratio between the average value of each total times and the N total times is further utilized as N abnormal coefficients, the larger the abnormal coefficient is, the higher the importance of the abnormal word set in a conference record is illustrated, for example, the important decision scheme in the conference is related, and the like, so that the importance analysis of the N abnormal word sets is realized, and the data support is provided for the subsequent data cleaning.

Step S500: respectively judging whether the N abnormal coefficients are larger than a preset coefficient threshold value, respectively inputting Q abnormal word sets and Q conference subjects with judging results of no into a missing data supplementing model to obtain Q supplementing analysis result sets, wherein the number of the supplementing analysis results in each supplementing analysis result set is positively correlated with the Q abnormal coefficients, recording and alarming the N-Q abnormal word sets with judging results of yes, and Q is an integer which is larger than or equal to 0 and smaller than N;

specifically, the preset coefficient threshold is a judging basis for processing the abnormal word set, and can be set by oneself according to actual conditions, specifically, whether the N abnormal coefficients are larger than the preset coefficient threshold or not is judged respectively, Q abnormal coefficients smaller than or equal to the preset coefficient threshold in the N abnormal coefficients are extracted, Q abnormal word sets and Q conference subjects corresponding to the N abnormal coefficients are obtained, Q is smaller than or equal to N and is an integer larger than 0, the importance of the Q abnormal word sets and the Q conference subjects is considered to be lower, intelligent supplementation can be performed through a missing data supplementation model, the missing data supplementation model is a BP neural network model for performing intelligent supplementation on conference records, the Q abnormal word sets and the Q conference subjects are respectively input into the missing data supplementation model, and Q supplementation analysis result sets are obtained, wherein the number of supplementation analysis results in each supplementation analysis result set is positively correlated with the Q abnormal coefficients, that is larger, and the number of the corresponding supplementation analysis results is larger; further extracting N-Q abnormal coefficients larger than a preset coefficient threshold value from the N abnormal coefficients, acquiring N-Q abnormal word sets and N-Q conference subjects corresponding to the N abnormal coefficients, and considering that the importance of the N-Q abnormal word sets is higher, for example, the conference subjects relate to important decision schemes and the like, recording and alarming the N-Q abnormal word sets at the moment, and reminding workers of manual supplement. Therefore, the technical scheme for judging the importance of the abnormal word set is realized, and the effects of combining intelligent supplement and manual supplement and ensuring the accuracy of data supplement are achieved.

As shown in fig. 3, step S500 of the embodiment of the present invention further includes:

step S510: acquiring a plurality of sample abnormal word sets and a plurality of sample supplementary analysis results according to recorded data of the conference in the historical time;

step S520: randomly selecting P groups of data from the plurality of sample abnormal word sets and the plurality of sample supplemental analysis results as a first construction data set, constructing a first missing data supplemental unit in the missing data supplemental model, wherein P is an integer greater than 1 and less than the number of the plurality of sample abnormal word sets;

step S530: randomly selecting P groups of data from the plurality of sample abnormal word sets and the plurality of sample supplemental analysis results as a second construction data set, and constructing a second missing data supplemental unit in the missing data supplemental model;

step S540: continuously constructing J missing data supplementing units in the missing data supplementing model to obtain the missing data supplementing model, wherein J is an integer greater than 2;

step S550: according to the Q abnormal coefficients, Q analysis times are calculated and obtained;

step S560: and respectively inputting the Q abnormal word sets and the Q conference subjects into the missing data supplementing units of the Q analysis times in the missing data supplementing model to obtain the Q supplementing analysis result sets.

The step S550 of the embodiment of the present invention includes:

step S551: taking J/2 as a preset analysis frequency;

step S552: and calculating in the range of J according to the Q abnormal coefficients and the preset analysis times to obtain the Q analysis times.

The step S520 of the embodiment of the present invention includes:

step S521: carrying out data identification and division on the first constructed data set to obtain a training set, a verification set and a test set;

step S522: taking the abnormal word set as input data, taking a complementary analysis result as output data, and constructing a first missing data supplementing unit based on a BP neural network;

step S523: the training set is adopted to conduct supervision training on the first missing data supplementing unit, and network parameters are adjusted and updated according to errors of the predicted value and the true value until convergence conditions are achieved;

step S524: and verifying and testing the first missing data supplementing unit by adopting the verification set and the test set, and obtaining the first missing data supplementing unit under the condition of meeting the accuracy requirement.

The Q abnormal word sets and Q conference subjects are respectively input into a missing data supplementing model, and the Q supplementing analysis result sets are obtained by the following steps: according to recorded data of a conference in historical time, a plurality of sample abnormal word sets and a plurality of sample supplementary analysis results are obtained, P groups of data are randomly selected from the plurality of sample abnormal word sets and the plurality of sample supplementary analysis results to serve as a first construction data set, a first missing data supplementary unit in the missing data supplementary model is constructed, P is an integer which is more than 1 and less than the number of the plurality of sample abnormal word sets, and P is preferably 2/3 of the number of the plurality of sample abnormal word sets; randomly selecting P groups of data from the plurality of sample abnormal word sets and the plurality of sample supplemental analysis results as a second construction data set, and constructing a second missing data supplemental unit in the missing data supplemental model; and continuing to construct J missing data supplementing units in the missing data supplementing model to obtain the missing data supplementing model, wherein J is an integer greater than 2. That is, the missing data supplementing model is composed of a plurality of different missing data supplementing units, the construction data of each missing data supplementing unit is not identical, a plurality of missing data supplementing units with different performances can be obtained, the accuracy of supplementing analysis can be improved through supplementing analysis by the plurality of missing data supplementing units, the quantity of construction data in each missing data supplementing unit is small, and convergence efficiency can be improved.

Further, according to the Q abnormal coefficients, Q analysis times are calculated and obtained, then the Q abnormal word sets and Q conference subjects are respectively input into the Q missing data supplementing units of the analysis times in the missing data supplementing model, in colloquial terms, the missing data supplementing model includes a plurality of missing data supplementing units, according to the Q analysis times, the missing data supplementing units corresponding to the Q analysis times are randomly selected from the plurality of missing data supplementing units to analyze the Q abnormal word sets and the Q conference subjects, for example, if the analysis times corresponding to one of the abnormal word sets and the conference subjects are 10, 10 missing data supplementing units are randomly selected from the plurality of missing data supplementing units, the abnormal word sets and the conference subjects are input into the selected 10 missing data supplementing units to obtain a supplementing analysis result set, the supplementing analysis result set includes 10 supplementing analysis results, and so on, according to the Q analysis times, the corresponding missing data supplementing units are selected to analyze the Q abnormal word sets and the Q conference subjects to obtain the Q supplement word sets, and each supplementing analysis result set includes the multiple supplement analysis result sets.

The method for calculating and obtaining Q analysis times according to the Q abnormal coefficients is as follows: j/2 is taken as the preset analysis times, J is the total number of missing data supplementing units in the missing data supplementing model, the preset analysis times can be adjusted according to the total number of the missing data supplementing units, such as J/3 and the like, and the method is not limited. And further, according to the Q abnormal coefficients and the preset analysis times, calculating in the range of J to obtain the Q analysis times, for example, multiplying the abnormal coefficients by the preset analysis times J/2 and rounding, where the obtained result is taken as the analysis times, and the obtained result is only exemplified herein, and can be specifically set by itself in combination with actual situations, which is not limited.

Specifically, the process of constructing the first missing data supplementation unit in the missing data supplementation model is as follows: the first construction data set comprises P groups of sample abnormal word sets and sample supplement analysis results, the sample abnormal word sets and the sample supplement analysis results are in one-to-one correspondence, data identification and division are carried out on the first construction data set, a training set, a verification set and a test set are obtained, for example, the training set, the verification set and the test set are divided according to the proportion of 6:1:1, and data marking is carried out. The abnormal word set is taken as input data, a complementary analysis result is taken as output data, the first missing data complementary unit is constructed based on the BP neural network, the first missing data complementary unit comprises a plurality of simple units simulating human brain neurons, the first missing data complementary unit can form network parameters such as weights, thresholds and the like connected between the simple units in the supervision training process, and the first missing data complementary unit after training can perform complex nonlinear logic operation according to the input data to obtain the output data.

And (3) inputting the abnormal sample word set in the training set into the first missing data supplementing unit, performing supervision training on the first missing data supplementing unit through a corresponding sample supplementing analysis result, wherein the predicted value is output data of the first missing data supplementing unit according to errors of the predicted value and the true value, the true value is the corresponding sample supplementing analysis result, the difference between the predicted value and the true value is the error, and the network parameters are adjusted and updated according to the error between the predicted value and the true value until convergence conditions are achieved, wherein the convergence conditions are that the output data is basically consistent with the corresponding sample supplementing analysis result, and the error is minimum. And further, the verification set and the test set are adopted to verify and test the trained first missing data supplementing unit, the data in the verification set is used for testing the error rate of the first missing data supplementing unit, the network parameters of the first missing data supplementing unit are readjusted according to the error rate, the verification set can be considered to participate in the training of the first missing data supplementing unit, the performance of the first missing data supplementing unit is tested through the test set, the output accuracy is obtained, if the output accuracy meets the accuracy requirement (such as 95%), the first missing data supplementing unit is obtained, otherwise, the training, the verification and the test are required to be carried out again until the test accuracy meets the requirement, and the effect of guaranteeing the performance of the first missing data supplementing unit is achieved.

And constructing J missing data supplementing units in the missing data supplementing model by adopting the same method as that for constructing the first missing data supplementing unit, so as to obtain the missing data supplementing model.

Step S600: and respectively adopting the complementary analysis results with highest occurrence frequency in the Q complementary analysis result sets to carry out complementary processing on the Q abnormal word sets to obtain Q cleaning record data.

Specifically, each supplementary analysis result set includes a plurality of supplementary analysis results, the plurality of supplementary analysis results include the same result, a plurality of supplementary analysis results in any one supplementary analysis result set are statistically analyzed, Q supplementary analysis results with highest occurrence frequency are extracted, and Q abnormal word sets are subjected to supplementary processing by adopting Q supplementary analysis results with highest occurrence frequency, that is, the supplementary analysis results with highest occurrence frequency are simply added to Q abnormal word sets to obtain Q cleaning record data, and the Q cleaning record data are Q conference record data for completing data supplementation.

Based on the analysis, the invention provides an intelligent cleaning method for conference record data, in the embodiment, record data in the conference record are divided to obtain a plurality of record data, words are divided for the plurality of record data to obtain a plurality of word sets, word segmentation processing for the conference record is realized, and the effects of improving the accuracy of data missing analysis and guaranteeing the accuracy of abnormal analysis are achieved. And further performing part-of-speech analysis and missing analysis on the plurality of word sets to obtain N abnormal word sets, so that the analysis on missing data is realized, N abnormal coefficients of the N abnormal word sets are obtained through analysis, and then the N abnormal coefficients are compared with a preset coefficient threshold value to determine the abnormal word sets which are intelligently supplemented and manually supplemented in the N abnormal word sets, thereby ensuring the accuracy of data missing supplementation and avoiding the technical effects of errors caused by important meeting records. When intelligent supplementation is performed, a plurality of missing data supplementation units in a missing data supplementation model are constructed based on the BP neural network, abnormal word sets are analyzed through the missing data supplementation units with different constructed data and performances, so that the output accuracy and convergence efficiency of the model are improved, the effect is improved, the supplementation processing is performed on the abnormal word sets by adopting the supplementation analysis result with the highest occurrence frequency in the supplementation analysis result set, the supplementation efficiency and accuracy of the missing data are improved, and the technical effect of meeting record cleaning efficiency is further improved.

Example two

Based on the same inventive concept as the intelligent cleaning method for the conference recording data in the foregoing embodiment, as shown in fig. 4, the present invention further provides an intelligent cleaning system for the conference recording data, where the system includes:

the conference record acquisition module 11, wherein the conference record acquisition module 11 is used for acquiring M conference topics in a currently completed conference and completed conference records, and M is an integer greater than 1;

the data dividing module 12 is configured to divide the record data in the conference record to obtain a plurality of record data, and perform word division on the plurality of record data to obtain a plurality of word sets;

the word set analysis module 13 is configured to perform part-of-speech analysis and missing analysis on the multiple word sets, and obtain N abnormal word sets when part-of-speech missing occurs in at least one word set, where N is an integer greater than 1;

the conference theme analysis module 14 is configured to perform conference theme analysis on the words in the N abnormal word sets, obtain corresponding N conference themes, and analyze and obtain N abnormal coefficients of the N abnormal word sets;

the abnormal coefficient judging module 15 is configured to respectively judge whether the N abnormal coefficients are greater than a preset coefficient threshold, respectively input Q abnormal word sets and Q conference subjects, the judgment result of which is no, into a missing data supplementing model to obtain Q supplementary analysis result sets, wherein the number of the supplementary analysis results in each supplementary analysis result set is positively correlated with the Q abnormal coefficients, record and alarm the N-Q abnormal word sets, the judgment result of which is yes, and Q is an integer greater than or equal to 0 and less than or equal to N;

and the supplementary processing module 16 is configured to perform supplementary processing on the Q abnormal word sets by using the supplementary analysis results with the highest occurrence frequency in the Q supplementary analysis result sets, so as to obtain Q cleaning record data.

Further, the data dividing module 12 is further configured to:

dividing a plurality of record data in the conference record according to a preset dividing rule;

and performing word segmentation processing on the plurality of record data to obtain the plurality of word sets.

Further, the word set analysis module 13 is further configured to:

inputting the word sets into a preset part-of-speech database based on the preset part-of-speech database to obtain a plurality of part-of-speech information sets, wherein the preset part-of-speech database comprises mapping relations between a plurality of sample words and a plurality of sample parts-of-speech;

judging whether at least one of a plurality of necessary parts of speech is absent in the part-of-speech information sets, if not, judging that the part-of-speech information sets are normal, if so, judging that the part-of-speech is absent, and taking the corresponding word set as the N abnormal word sets when the part-of-speech is absent in at least one part-of-speech information set.

Further, the conference theme analysis module 14 is further configured to:

constructing a conference keyword database according to recorded data of a conference in historical time, wherein the conference keyword database comprises a plurality of sub-databases corresponding to a plurality of sample conference subjects;

traversing and searching the words in the N abnormal word sets in the plurality of sub-databases to obtain the occurrence times of each word in the plurality of sub-databases, and obtaining a plurality of occurrence times sets;

outputting conference subjects of the sub-database corresponding to the largest occurrence number in each occurrence number set to obtain N conference subjects;

acquiring the total number of times of the words in the N abnormal word sets in the sub-databases of the N conference topics, and acquiring N total times;

and calculating the ratio of each total frequency to the average value of the N total frequencies to obtain the N abnormal coefficients.

Further, the anomaly coefficient determining module 15 is further configured to:

acquiring a plurality of sample abnormal word sets and a plurality of sample supplementary analysis results according to recorded data of the conference in the historical time;

randomly selecting P groups of data from the plurality of sample abnormal word sets and the plurality of sample supplemental analysis results as a first construction data set, constructing a first missing data supplemental unit in the missing data supplemental model, wherein P is an integer greater than 1 and less than the number of the plurality of sample abnormal word sets;

randomly selecting P groups of data from the plurality of sample abnormal word sets and the plurality of sample supplemental analysis results as a second construction data set, and constructing a second missing data supplemental unit in the missing data supplemental model;

continuously constructing J missing data supplementing units in the missing data supplementing model to obtain the missing data supplementing model, wherein J is an integer greater than 2;

according to the Q abnormal coefficients, Q analysis times are calculated and obtained;

and respectively inputting the Q abnormal word sets and the Q conference subjects into the missing data supplementing units of the Q analysis times in the missing data supplementing model to obtain the Q supplementing analysis result sets.

taking J/2 as a preset analysis frequency;

and calculating in the range of J according to the Q abnormal coefficients and the preset analysis times to obtain the Q analysis times.

carrying out data identification and division on the first constructed data set to obtain a training set, a verification set and a test set;

taking the abnormal word set as input data, taking a complementary analysis result as output data, and constructing a first missing data supplementing unit based on a BP neural network;

the training set is adopted to conduct supervision training on the first missing data supplementing unit, and network parameters are adjusted and updated according to errors of the predicted value and the true value until convergence conditions are achieved;

and verifying and testing the first missing data supplementing unit by adopting the verification set and the test set, and obtaining the first missing data supplementing unit under the condition of meeting the accuracy requirement.

The specific example of the intelligent cleaning method for conference recording data in the first embodiment is also applicable to the intelligent cleaning system for conference recording data in the present embodiment, and by the foregoing detailed description of the intelligent cleaning method for conference recording data, those skilled in the art can clearly know the intelligent cleaning system for conference recording data in the present embodiment, so that the detailed description thereof will not be repeated here for brevity of description.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, as long as the desired results of the technical solution disclosed in the present invention can be achieved, and are not limited herein.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. An intelligent cleaning method for meeting record data is characterized by comprising the following steps:

obtaining M conference subjects in a currently completed conference and a completed conference record, wherein M is an integer greater than 1;

dividing the recorded data in the conference record to obtain a plurality of recorded data, and dividing words of the plurality of recorded data to obtain a plurality of word sets;

performing part-of-speech analysis and missing analysis on the plurality of word sets, and acquiring N abnormal word sets when part-of-speech missing occurs in at least one word set, wherein N is an integer greater than 1;

performing conference theme analysis on the words in the N abnormal word sets to obtain N corresponding conference themes, and analyzing and obtaining N abnormal coefficients of the N abnormal word sets;

respectively judging whether the N abnormal coefficients are larger than a preset coefficient threshold value, respectively inputting Q abnormal word sets and Q conference subjects with judging results of no into a missing data supplementing model to obtain Q supplementing analysis result sets, wherein the number of the supplementing analysis results in each supplementing analysis result set is positively correlated with the Q abnormal coefficients, recording and alarming the N-Q abnormal word sets with judging results of yes, and Q is an integer which is larger than or equal to 0 and smaller than N;

respectively adopting the complementary analysis results with highest occurrence frequency in the Q complementary analysis result sets to carry out complementary processing on the Q abnormal word sets to obtain Q cleaning record data;

performing conference topic analysis on the words in the N abnormal word sets to obtain N corresponding conference topics, and analyzing and obtaining N abnormal coefficients of the N abnormal word sets, wherein the method comprises the following steps:

calculating the ratio of each total frequency to the average value of the N total frequencies to obtain N abnormal coefficients;

respectively inputting the Q abnormal word sets and the Q conference subjects into a missing data supplementing model to obtain Q supplementing analysis result sets, wherein the Q supplementing analysis result sets comprise:

respectively inputting the Q abnormal word sets and Q conference subjects into missing data supplementing units of the Q analysis times in the missing data supplementing model to obtain Q supplementing analysis result sets;

according to the Q abnormal coefficients, calculating and obtaining Q analysis times, including:

taking J/2 as a preset analysis frequency;

calculating in the range of J according to the Q abnormal coefficients and the preset analysis times to obtain the Q analysis times;

constructing a first missing data supplementation unit in the missing data supplementation model, comprising:

2. The method of claim 1, wherein dividing the recorded data within the meeting record to obtain a plurality of recorded data and word dividing the plurality of recorded data comprises:

3. The method of claim 1, wherein performing part-of-speech analysis and missing analysis on the plurality of word sets, when at least one word set has part-of-speech missing, obtaining N abnormal word sets comprises:

4. A conference recording data intelligent cleaning system for performing the method of any one of claims 1-3, the system comprising:

the conference record acquisition module is used for acquiring M conference subjects in a currently completed conference and completed conference records, wherein M is an integer greater than 1;

the data dividing module is used for dividing the recorded data in the conference record to obtain a plurality of recorded data, and dividing words of the plurality of recorded data to obtain a plurality of word sets;

the word set analysis module is used for performing part-of-speech analysis and missing analysis on the word sets, and acquiring N abnormal word sets when part-of-speech missing occurs in at least one word set, wherein N is an integer greater than 1;

the conference theme analysis module is used for carrying out conference theme analysis on the words in the N abnormal word sets, obtaining corresponding N conference themes and analyzing and obtaining N abnormal coefficients of the N abnormal word sets;

the abnormal coefficient judging module is used for respectively judging whether the N abnormal coefficients are larger than a preset coefficient threshold value, respectively inputting Q abnormal word sets and Q conference subjects which are judged as no into the missing data supplementing model to obtain Q supplementing analysis result sets, wherein the number of the supplementing analysis results in each supplementing analysis result set is positively correlated with the Q abnormal coefficients, recording and alarming the N-Q abnormal word sets which are judged as yes, and Q is an integer which is larger than or equal to 0 and smaller than or equal to N;

and the supplementary processing module is used for carrying out supplementary processing on the Q abnormal word sets by adopting the supplementary analysis results with the highest occurrence frequency in the Q supplementary analysis result sets respectively to obtain Q cleaning record data.