CN117273013B - Electronic data processing method for stroke records - Google Patents

Electronic data processing method for stroke records Download PDF

Info

Publication number
CN117273013B
CN117273013B CN202311549713.9A CN202311549713A CN117273013B CN 117273013 B CN117273013 B CN 117273013B CN 202311549713 A CN202311549713 A CN 202311549713A CN 117273013 B CN117273013 B CN 117273013B
Authority
CN
China
Prior art keywords
fuzzy
vocabulary
electronic
text data
quality
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311549713.9A
Other languages
Chinese (zh)
Other versions
CN117273013A (en
Inventor
迟慧
吴思奇
李长庭
廖曼丞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
PEOPLE'S PUBLIC SECURITY UNIVERSITY OF CHINA
Original Assignee
PEOPLE'S PUBLIC SECURITY UNIVERSITY OF CHINA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by PEOPLE'S PUBLIC SECURITY UNIVERSITY OF CHINA filed Critical PEOPLE'S PUBLIC SECURITY UNIVERSITY OF CHINA
Priority to CN202311549713.9A priority Critical patent/CN117273013B/en
Publication of CN117273013A publication Critical patent/CN117273013A/en
Application granted granted Critical
Publication of CN117273013B publication Critical patent/CN117273013B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to the field of data processing, in particular to a method for processing electronic data of a stroke record, which comprises the following steps: collecting electronic stroke text data; acquiring all fuzzy words in the text data of the electronic stroke, obtaining semantic environment objects corresponding to each fuzzy word according to the fuzzy words, and dividing intervals; obtaining a first quality influence parameter and a second quality influence parameter of the fuzzy vocabulary in each fuzzy vocabulary interval according to the number of the fuzzy vocabulary contained in each interval and the distance between the adjacent fuzzy vocabulary, and obtaining an electronic-record text data quality parameter in each fuzzy vocabulary interval according to the first quality influence parameter and the second quality influence parameter of the fuzzy vocabulary in each fuzzy vocabulary interval; and obtaining the quality parameters of the electronic stroke text data, and finishing the data abnormality detection processing. According to the method and the device for processing the electronic-stroke text data, the electronic-stroke text data are processed, and the accuracy of abnormality detection of the electronic-stroke text data is improved.

Description

Electronic data processing method for stroke records
Technical Field
The invention relates to the technical field of data processing, in particular to a method for processing electronic data of a stroke.
Background
In the prior art, the quality evaluation of the electronic records is generally performed by extracting specific keywords from the electronic records through a related language model algorithm and then performing the quality evaluation of the electronic records through keyword analysis; however, in the actual stroke record data, more fuzzy description aspects exist, and the related vocabulary corresponding to the whole fuzzy description cannot be detected by a keyword detection algorithm, but the fuzzy description in the electronic stroke record has serious influence on the quality of the fuzzy description.
Disclosure of Invention
The invention provides a method for processing electronic data of a stroke record, which aims to solve the existing problems.
The invention discloses a method for processing electronic data of a pen record, which adopts the following technical scheme:
one embodiment of the invention provides a method for processing electronic data of a pen record, which comprises the following steps:
collecting electronic stroke text data;
acquiring all fuzzy words in the text data of the electronic stroke, obtaining semantic environment objects corresponding to each fuzzy word according to the fuzzy words, and dividing the fuzzy words according to the semantic environment objects corresponding to the fuzzy words to obtain a plurality of fuzzy word intervals;
obtaining a first quality influence parameter of the fuzzy vocabulary in each fuzzy vocabulary section according to the number of the fuzzy vocabulary contained in each fuzzy vocabulary section and the distance between the adjacent fuzzy vocabulary, obtaining a second quality influence parameter of the fuzzy vocabulary in each fuzzy vocabulary section according to the distance between the adjacent fuzzy vocabulary in each fuzzy vocabulary section, and obtaining a quality parameter of the electronic-record text data in each fuzzy vocabulary section according to the first quality influence parameter and the second quality influence parameter of the fuzzy vocabulary in each fuzzy vocabulary section;
and obtaining the quality parameters of the electronic-record text data according to the quality parameters of the electronic-record text data in the fuzzy vocabulary interval, and detecting and processing the data abnormality in the electronic-record text data according to the quality parameters of the electronic-record text data.
Further, the step of obtaining all fuzzy vocabulary in the text data of the electronic pen records and obtaining the semantic environment object corresponding to each fuzzy vocabulary according to the fuzzy vocabulary comprises the following specific steps:
detecting and identifying all fuzzy vocabulary in the electronic stroke text data by using a naming body identification algorithm, acquiring all fuzzy vocabulary according to the sequence of the electronic stroke text data, and forming a group of fuzzy vocabulary sequences
Obtaining fuzzy vocabulary sequence according to Word2Vec modelSemantic environmental objects corresponding to each fuzzy vocabulary are marked as +.>Which represents the semantic context object corresponding to the i-th ambiguous vocabulary.
Further, the fuzzy vocabulary is divided according to the semantic environment object corresponding to the fuzzy vocabulary to obtain a plurality of fuzzy vocabulary intervals, and the method comprises the following specific steps:
first from the first ambiguous vocabularyInitially, the first ambiguous word +.>Corresponding semantic context objectAnd a second ambiguous vocabulary->Corresponding semantic environmental object->Whether or not the words are identical, if so, comparing the second ambiguous word +.>Corresponding semantic environmental object->And third ambiguous vocabulary->Corresponding semantic environmental object->Whether or not to be identical, if->And->If the first fuzzy vocabulary and the second fuzzy vocabulary are different, dividing the first fuzzy vocabulary and the second fuzzy vocabulary into a section; if->And->The same, the third ambiguous word +.>Corresponding semantic environmental object->And fourth ambiguous vocabulary->Corresponding semantic environmental object->And (5) whether the fuzzy vocabulary sections are identical or not, and dividing all the fuzzy vocabulary sections in sequence.
Further, the calculation formula of the first quality influence parameter of the fuzzy vocabulary in each fuzzy vocabulary interval is:
in the method, in the process of the invention,representing the number of fuzzy vocabularies contained in the mth fuzzy vocabulary interval, +.>Represents the number of fuzzy vocabulary included in the t-th fuzzy vocabulary section, M represents the number of all fuzzy vocabulary sections,/or->Representing the distance between the ith fuzzy word and the (i+1) th fuzzy word in the mth fuzzy word interval,/and->A first quality influencing parameter representing the fuzzy vocabulary in the mth fuzzy vocabulary section,/a>An exponential function based on a natural constant is represented.
Further, the calculation formula of the second quality influence parameter of the fuzzy vocabulary in each fuzzy vocabulary interval is:
in the method, in the process of the invention,representing the number of fuzzy vocabularies contained in the mth fuzzy vocabulary interval, +.>Representing the distance between the ith fuzzy word and the (i+1) th fuzzy word in the mth fuzzy word interval,/and->A second quality-affecting parameter representing the fuzzy vocabulary in the mth fuzzy vocabulary interval.
Further, the specific obtaining steps of the distance between the adjacent fuzzy vocabularies are as follows:
the distance between adjacent ambiguous words refers to the number of words contained between the locations of the adjacent ambiguous words in the electronic-entry text data.
Further, the calculation formula of the quality parameters of the electronic-stroke text data in each fuzzy vocabulary interval is as follows:
in the method, in the process of the invention,a first quality influencing parameter representing the fuzzy vocabulary in the mth fuzzy vocabulary section,/a>Second representing ambiguous words in the mth ambiguous word sectionQuality influencing parameter(s)>Correction coefficient representing mth fuzzy vocabulary section, < ->And representing the quality parameters of the text data of the electronic strokes in the mth fuzzy vocabulary interval.
Further, the specific obtaining steps of the correction coefficient of the fuzzy vocabulary interval are as follows:
when the first quality influence parameter of the fuzzy vocabulary in the fuzzy vocabulary interval is larger than the second quality influence parameter of the fuzzy vocabulary in the fuzzy vocabulary interval, the correction coefficient of the fuzzy vocabulary interval is-1; when the first quality influence parameter of the fuzzy vocabulary in the fuzzy vocabulary section is smaller than or equal to the second quality influence parameter of the fuzzy vocabulary in the fuzzy vocabulary section, the correction coefficient of the fuzzy vocabulary section is 1.
Further, the calculation formula of the quality parameters of the electronic-stroke text data is as follows:
in the method, in the process of the invention,representing the quality parameters of the text data of the electronic pen records in the mth fuzzy vocabulary interval, wherein M represents the number of all fuzzy vocabulary intervals and +.>Representing the quality parameters of the text data of the electronic transcription, +.>Representing a linear normalization function.
Further, the detecting and processing of the data abnormality in the electronic-stroke text data according to the quality parameters of the electronic-stroke text data comprises the following specific steps:
when electronic writing text data quality parameterNumber of digitsWhen the electronic-stroke text data is larger than or equal to a preset threshold A, judging that the electronic-stroke text data is not abnormal; when electronic writing text data quality parameter +.>And when the electronic record text data is smaller than the preset threshold A, judging that the electronic record text data is abnormal.
The technical scheme of the invention has the beneficial effects that: according to the method, a plurality of fuzzy vocabulary sections are obtained by dividing the electronic writing text data into sections, a first quality influence parameter and a second quality influence parameter of fuzzy vocabularies in each fuzzy vocabulary section are obtained according to the number of the fuzzy vocabularies contained in each section and the distance between adjacent fuzzy vocabularies, analysis is carried out through the first quality influence parameter and the second quality influence parameter of the fuzzy vocabularies in the fuzzy vocabulary section, and evaluation of the quality of the electronic writing text data is improved; and obtaining the quality parameters of the electronic writing text data in the fuzzy vocabulary interval through the first quality influence parameters and the second quality influence parameters of the fuzzy vocabulary in the fuzzy vocabulary interval, obtaining the quality parameters of the electronic writing text data according to the quality parameters of the electronic writing text data in the fuzzy vocabulary interval, and carrying out data abnormality detection processing on the electronic writing text data according to the quality parameters of the electronic writing text data, thereby improving the accuracy of abnormality detection of the electronic writing text data.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart showing steps of a method for processing electronic data of a pen record according to the present invention.
Detailed Description
In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following detailed description refers to specific implementation, structure, characteristics and effects of a method for processing electronic data according to the invention, which is provided by the invention, with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following specifically describes a specific scheme of the method for processing electronic data of a pen record provided by the invention with reference to the accompanying drawings.
Referring to fig. 1, a flowchart of steps of a method for processing electronic data of a pen record according to an embodiment of the present invention is shown, where the method includes the following steps:
step S001: and acquiring the electronic data of the electronic records, and preprocessing the acquired electronic data of the electronic records to obtain text data of the electronic records.
It should be noted that, in order to analyze the electronic data of the pen records, it is first required to collect the electronic data to analyze whether the collected electronic data is clear. In addition, when the electronic data of the stroke is extracted, the corresponding electronic data of the stroke is often various, such as common voice strokes, handwriting strokes, electronic strokes and the like, and the overall data form is too many, which is unfavorable for extracting effective information, so that the electronic data of the stroke needs to be unified in form. In order to prevent field deletion and noise, unnecessary characters and format in text data from affecting the whole written data, the whole written data needs to be preprocessed to enable the data to be cleaner, consistent and easy to process, and the whole preprocessing process is to make clear of the data in an electronic text form corresponding to the whole written data by using a text clear algorithm.
Specifically, electronic data of the electronic stroke records are collected and converted into text data, then the text data are preprocessed by using an N-gram model, and the processed text data are recorded as the text data of the electronic stroke records.
Thus, the electronic stroke text data is obtained.
Step S002: all fuzzy words in the electronic stroke text data are obtained, semantic environment objects corresponding to each fuzzy word are obtained according to the fuzzy words, and interval division is carried out on the fuzzy words according to the semantic environment objects corresponding to the fuzzy words.
Note that, in the information corresponding to the electronic-entry text data, there is a certain amount of information, for example: the terms "may", "perhaps", "as" and the like are used with a degree of ambiguity. Because these fuzzy words include a certain uncertainty and inaccuracy, which often have an influence on the definition and reliability of the stroke, so that the quality of the stroke is affected, but the stroke is often queried and recorded for problems in multiple aspects, and the fuzzy words often appear in different positions, and in this embodiment, context connection of the corresponding fuzzy words is often needed when quality judgment of electronic stroke text data based on the fuzzy words is performed subsequently, so that before subsequent processing, the fuzzy words in the electronic stroke text data need to be extracted first, and information in the whole electronic stroke text data needs to be partitioned by utilizing semantic environments corresponding to the different fuzzy words, so that subsequent analysis of the fuzzy words by combining the context is facilitated.
Specifically, detecting and identifying all fuzzy vocabulary in the electronic stroke text data by using a named-body identification algorithm to obtain all fuzzy vocabulary in the electronic stroke text data; the named-body recognition algorithm is a well-known technique, and is not specifically described herein. Sequencing all fuzzy words in the electronic stroke text data according to the sequence of the text, and forming a group of fuzzy word sequences by the sequenced data, wherein the fuzzy word sequences are usedAnd (3) representing. Sequence->Can be expressed as:
in the method, in the process of the invention,represents the nth fuzzy vocabulary, and N represents the number of fuzzy vocabulary.
Note that, in the information of the electronic transcription text data, the fuzzy vocabulary generally describes and modifies a certain semantic environmental object, and in the electronic transcription text data, a corresponding certain semantic environmental object may correspond to the corresponding fuzzy vocabulary, and in the following, analysis of a connection context is required for different fuzzy vocabulary, so that partitioning by utilizing the semantic environmental object according to the fuzzy vocabulary is required.
Specifically, the fuzzy vocabulary sequence is obtained according to the Word2Vec modelSemantic environmental objects corresponding to each fuzzy vocabulary are marked as +.>The semantic context object corresponding to the i-th fuzzy vocabulary is represented; the Word2Vec model is a known technology, and detailed descriptions thereof are omitted here.
According to the semantic environment object corresponding to the fuzzy vocabulary, the section is divided, and the specific operation is as follows: first from the first ambiguous vocabularyInitially, the first ambiguous word +.>Corresponding semantic environmental object->And a second fuzzy vocabularyCorresponding semantic environmental object->Whether or not the words are identical, if so, comparing the second ambiguous word +.>Corresponding semantic environmental object->And third ambiguous vocabulary->Corresponding semantic environmental object->Whether or not to be identical, if->And->If the first fuzzy vocabulary and the second fuzzy vocabulary are different, dividing the first fuzzy vocabulary and the second fuzzy vocabulary into a section; if->And->The same, the third ambiguous word +.>Corresponding semantic environmental object->And fourth ambiguous vocabulary->Corresponding semantic environmental object->And (5) whether the fuzzy vocabulary sections are identical or not, and dividing all the fuzzy vocabulary sections in sequence.
Thus, a plurality of fuzzy vocabulary sections are obtained.
Step S003: obtaining a first quality influence parameter of the fuzzy vocabulary in each fuzzy vocabulary section according to the number of the fuzzy vocabulary contained in each fuzzy vocabulary section and the distance between the adjacent fuzzy vocabulary, obtaining a second quality influence parameter of the fuzzy vocabulary in each fuzzy vocabulary section according to the distance between the adjacent fuzzy vocabulary in each fuzzy vocabulary section, and obtaining a quality parameter of the electronic-record text data in each fuzzy vocabulary section according to the first quality influence parameter and the second quality influence parameter of the fuzzy vocabulary in each fuzzy vocabulary section.
(1) And obtaining a first quality influence parameter of the fuzzy vocabulary in each fuzzy vocabulary interval according to the number of the fuzzy vocabularies contained in each fuzzy vocabulary interval and the distance between the adjacent fuzzy vocabularies.
It should be noted that, the above text data corresponding to the electronic transcription is subjected to text partition corresponding to the semantic environment based on the fuzzy vocabulary corresponding to the electronic transcription text data, and in each text partition, because of the difference of the fuzzy vocabulary and the difference of the text data in the text data, a certain difference exists in the quality of the text information data in each electronic transcription text data, so the embodiment performs analysis based on the fuzzy vocabulary corresponding to the text partition and the text data on the electronic transcription text data partition, and further performs quantization of the quality parameters of the electronic transcription text data of each text partition.
Specifically, a first quality influence parameter of the fuzzy vocabulary in each fuzzy vocabulary section is obtained according to the number of the fuzzy vocabulary contained in each fuzzy vocabulary section and the distance between the adjacent fuzzy vocabulary, and is expressed as follows by a formula:
in the method, in the process of the invention,representing the number of fuzzy vocabularies contained in the mth fuzzy vocabulary interval, +.>Represents the number of fuzzy vocabulary included in the t-th fuzzy vocabulary section, M represents the number of all fuzzy vocabulary sections,/or->Representing the distance between the ith fuzzy word and the (i+1) th fuzzy word in the mth fuzzy word interval,/and->A first quality influencing parameter representing the fuzzy vocabulary in the mth fuzzy vocabulary section,/a>An exponential function based on a natural constant is represented. The distance between adjacent fuzzy vocabularies refers to the number of words contained between the positions of the adjacent fuzzy vocabularies in the electronic-record text data, namely, the distance between the two fuzzy vocabularies in the electronic-record text data is equivalent to the distance between the two fuzzy vocabularies. Wherein when a fuzzy word appears in the fuzzy word section, the first quality influence parameter of the fuzzy word in the fuzzy word section at the moment is +.>
When the quality influence quantization of the fuzzy vocabulary in the mth fuzzy vocabulary interval on the electronic-stroke text data is performed, all fuzzy vocabularies describe the only one semantic environment object in the partition of the electronic-stroke text data.Representing the number of fuzzy vocabulary of the semantic environmental object description in the mth fuzzy vocabulary interval, ++>The density of the appearance of the whole fuzzy vocabulary in the m-th fuzzy vocabulary interval is represented. In the text partition, the more fuzzy words described for the unique semantic environment object appear, the greater the appearance density of the whole fuzzy words, which means that when the inquired person performs description on the current semantic environment object in the electronic record, the whole description is less clear, namely the greater the influence of the fuzzy words in the current partition section on the quality of the record data; i.e. the formula is also negative on the text for the mth ambiguous lexical interval.
(2) And obtaining a second quality influence parameter of the fuzzy vocabulary in each fuzzy vocabulary interval according to the distance between the adjacent fuzzy vocabularies in each fuzzy vocabulary interval.
Obtaining a second quality influence parameter of the fuzzy vocabulary in each fuzzy vocabulary interval according to the distance between the adjacent fuzzy vocabularies in each fuzzy vocabulary interval, wherein the second quality influence parameter is expressed as follows by a formula:
in the method, in the process of the invention,representing the number of fuzzy vocabularies contained in the mth fuzzy vocabulary interval, +.>Representing the distance between the ith fuzzy word and the (i+1) th fuzzy word in the mth fuzzy word interval,/and->A second quality-affecting parameter representing the fuzzy vocabulary in the mth fuzzy vocabulary interval. When one fuzzy word appears in the fuzzy word interval, the second quality influence parameter of the fuzzy word in the fuzzy word interval at the moment is 0.
In the fuzzy vocabulary section of the electronic writing text data, each semantic environment object is a more important vocabulary in the fuzzy vocabulary section of the electronic writing text data, the higher the occurrence frequency of the vocabulary is, the more confirmation is carried out on the fuzzy vocabulary, and further the current electronic writing text data for interrogation is supplemented, so that the greater the density is, the more the corresponding electronic writing text data is examined by utilizing the semantic environment object for multiple times in the electronic writing text data, and the importance degree of the semantic environment object is higher, and further the quality of the text partition of the part is improved, otherwise, the opposite is carried out; thus, the formula is also positively influencing the text for the mth ambiguous lexical interval.
(3) And obtaining the quality parameters of the electronic-stroke text data in each fuzzy vocabulary interval according to the first quality influence parameters and the second quality influence parameters of the fuzzy vocabulary in each fuzzy vocabulary interval.
Obtaining the quality parameters of the electronic-stroke text data in each fuzzy vocabulary interval according to the first quality influence parameters and the second quality influence parameters of the fuzzy vocabulary in each fuzzy vocabulary interval, wherein the quality parameters are expressed as follows by a formula:
in the method, in the process of the invention,a first quality influencing parameter representing the fuzzy vocabulary in the mth fuzzy vocabulary section,/a>A second quality-influencing parameter representing the fuzzy vocabulary in the mth fuzzy vocabulary section,/>Correction coefficient representing mth fuzzy vocabulary section, < ->And representing the quality parameters of the text data of the electronic strokes in the mth fuzzy vocabulary interval. Wherein the purpose of the formula preceded by 1 is to adjust on the basis of 1.
The correction coefficient of the fuzzy vocabulary interval is determined according to the magnitude relation between the first quality influence parameter and the second quality influence parameter of the fuzzy vocabulary in each fuzzy vocabulary interval, when the first quality influence parameter is larger than the second quality influence parameter, namely the negative influence of the fuzzy vocabulary is larger than the positive influence, the correction coefficient of the fuzzy vocabulary interval is-1; when the first quality influence parameter is smaller than or equal to the second quality influence parameter, namely that the negative influence of the fuzzy vocabulary is smaller than or equal to the positive influence, the correction coefficient of the fuzzy vocabulary interval is 1.
So far, the quality parameters of the text data of the electronic strokes in each fuzzy vocabulary interval are obtained.
Step S004: and obtaining the quality parameters of the electronic-record text data according to the quality parameters of the electronic-record text data in the fuzzy vocabulary interval, and detecting and processing the data abnormality in the electronic-record text data according to the quality parameters of the electronic-record text data.
Obtaining electronic-stroke text data quality parameters according to the electronic-stroke text data quality parameters in each fuzzy vocabulary interval, wherein the electronic-stroke text data quality parameters are expressed as follows by a formula:
in the method, in the process of the invention,representing the quality parameters of the text data of the electronic pen records in the mth fuzzy vocabulary interval, wherein M represents the number of all fuzzy vocabulary intervals and +.>Representing the quality parameters of the text data of the electronic transcription, +.>Representing a linear normalization function.
A threshold value a is preset, where the embodiment is described by taking a=0.7 as an example, and the embodiment is not specifically limited, where a may be determined according to the specific implementation situation. When electronic writing text data quality parametersWhen the electronic stroke text data is larger than or equal to a preset threshold A, judging that the quality of the electronic stroke text data is good and no abnormality exists; when electronic writing text data quality parameter +.>When the quality of the electronic-stroke text data is not good and the abnormality exists when the quality of the electronic-stroke text data is smaller than the preset threshold A, the electronic-stroke text data is modified by further interrogation.
This embodiment is completed.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims (7)

1. A method for processing electronic data of a stroke, the method comprising the steps of:
collecting electronic stroke text data;
acquiring all fuzzy words in the text data of the electronic stroke, obtaining semantic environment objects corresponding to each fuzzy word according to the fuzzy words, and dividing the fuzzy words according to the semantic environment objects corresponding to the fuzzy words to obtain a plurality of fuzzy word intervals;
obtaining a first quality influence parameter of the fuzzy vocabulary in each fuzzy vocabulary section according to the number of the fuzzy vocabulary contained in each fuzzy vocabulary section and the distance between the adjacent fuzzy vocabulary, obtaining a second quality influence parameter of the fuzzy vocabulary in each fuzzy vocabulary section according to the distance between the adjacent fuzzy vocabulary in each fuzzy vocabulary section, and obtaining a quality parameter of the electronic-record text data in each fuzzy vocabulary section according to the first quality influence parameter and the second quality influence parameter of the fuzzy vocabulary in each fuzzy vocabulary section;
obtaining electronic-stroke text data quality parameters according to the electronic-stroke text data quality parameters in the fuzzy vocabulary interval, and detecting and processing data anomalies in the electronic-stroke text data according to the electronic-stroke text data quality parameters;
the calculation formula of the first quality influence parameter of the fuzzy vocabulary in each fuzzy vocabulary interval is as follows:
in the method, in the process of the invention,representing the number of fuzzy vocabularies contained in the mth fuzzy vocabulary interval, +.>Represents the number of fuzzy vocabulary included in the t-th fuzzy vocabulary section, M represents the number of all fuzzy vocabulary sections,/or->Representing the distance between the ith fuzzy word and the (i+1) th fuzzy word in the mth fuzzy word interval,/and->A first quality influencing parameter representing the fuzzy vocabulary in the mth fuzzy vocabulary section,/a>An exponential function that is based on a natural constant;
the calculation formula of the second quality influence parameter of the fuzzy vocabulary in each fuzzy vocabulary interval is as follows:
in the method, in the process of the invention,representing the number of fuzzy vocabularies contained in the mth fuzzy vocabulary interval, +.>Representing the distance between the ith fuzzy word and the (i+1) th fuzzy word in the mth fuzzy word interval,/and->A second quality-affecting parameter representing a fuzzy vocabulary in the mth fuzzy vocabulary interval;
the calculation formula of the quality parameters of the text data of the electronic pen records in each fuzzy vocabulary interval is as follows:
in the method, in the process of the invention,a first quality influencing parameter representing the fuzzy vocabulary in the mth fuzzy vocabulary section,/a>A second quality-influencing parameter representing the fuzzy vocabulary in the mth fuzzy vocabulary section,/>The correction coefficient representing the mth fuzzy vocabulary interval,and representing the quality parameters of the text data of the electronic strokes in the mth fuzzy vocabulary interval.
2. The method for processing electronic data of a pen record according to claim 1, wherein the steps of obtaining all fuzzy vocabulary in the text data of the electronic pen record and obtaining semantic environment objects corresponding to each fuzzy vocabulary according to the fuzzy vocabulary comprise the following specific steps:
detecting and identifying all fuzzy vocabulary in the electronic stroke text data by using a naming body identification algorithm, acquiring all fuzzy vocabulary according to the sequence of the electronic stroke text data, and forming a group of fuzzy vocabulary sequences
Obtaining fuzzy vocabulary sequence according to Word2Vec modelSemantic environmental objects corresponding to each fuzzy vocabulary are marked as +.>Which represents the semantic context object corresponding to the i-th ambiguous vocabulary.
3. The method for processing electronic data of a pen record according to claim 1, wherein the step of dividing the fuzzy vocabulary according to the semantic environment object corresponding to the fuzzy vocabulary to obtain a plurality of fuzzy vocabulary sections comprises the following specific steps:
first from the first ambiguous vocabularyInitially, the first ambiguous word +.>Corresponding semantic environmental object->And a second ambiguous vocabulary->Corresponding semantic environmental object->Whether or not the words are identical, if so, comparing the second ambiguous word +.>Corresponding semantic environmental object->And third ambiguous vocabulary->Corresponding semantic environmental object->Whether or not to be identical, if->And->If the first fuzzy vocabulary and the second fuzzy vocabulary are different, dividing the first fuzzy vocabulary and the second fuzzy vocabulary into a section; if->And->The same, the third ambiguous word +.>Corresponding semantic environmental object->And fourth ambiguous vocabulary->Corresponding semantic environmental object->And (5) whether the fuzzy vocabulary sections are identical or not, and dividing all the fuzzy vocabulary sections in sequence.
4. The method for processing electronic data of a pen record according to claim 1, wherein the specific obtaining step of the distance between adjacent fuzzy vocabularies is as follows:
the distance between adjacent ambiguous words refers to the number of words contained between the locations of the adjacent ambiguous words in the electronic-entry text data.
5. The method for processing electronic data of a pen record according to claim 1, wherein the specific obtaining steps of the correction coefficient of the fuzzy vocabulary section are as follows:
when the first quality influence parameter of the fuzzy vocabulary in the fuzzy vocabulary interval is larger than the second quality influence parameter of the fuzzy vocabulary in the fuzzy vocabulary interval, the correction coefficient of the fuzzy vocabulary interval is-1; when the first quality influence parameter of the fuzzy vocabulary in the fuzzy vocabulary section is smaller than or equal to the second quality influence parameter of the fuzzy vocabulary in the fuzzy vocabulary section, the correction coefficient of the fuzzy vocabulary section is 1.
6. The method for processing electronic data according to claim 1, wherein the calculation formula of the quality parameters of the electronic-entry text data is:
in the method, in the process of the invention,representing the quality parameters of the text data of the electronic pen records in the mth fuzzy vocabulary interval, wherein M represents the number of all fuzzy vocabulary intervals and +.>Representing the number of text in an electronic penAccording to the quality parameters->Representing a linear normalization function.
7. The method for processing electronic data according to claim 1, wherein the detecting the data abnormality in the electronic recording text data according to the quality parameter of the electronic recording text data comprises the following steps:
when electronic writing text data quality parametersWhen the electronic-stroke text data is larger than or equal to a preset threshold A, judging that the electronic-stroke text data is not abnormal; when electronic writing text data quality parameter +.>And when the electronic record text data is smaller than the preset threshold A, judging that the electronic record text data is abnormal.
CN202311549713.9A 2023-11-21 2023-11-21 Electronic data processing method for stroke records Active CN117273013B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311549713.9A CN117273013B (en) 2023-11-21 2023-11-21 Electronic data processing method for stroke records

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311549713.9A CN117273013B (en) 2023-11-21 2023-11-21 Electronic data processing method for stroke records

Publications (2)

Publication Number Publication Date
CN117273013A CN117273013A (en) 2023-12-22
CN117273013B true CN117273013B (en) 2024-01-26

Family

ID=89201252

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311549713.9A Active CN117273013B (en) 2023-11-21 2023-11-21 Electronic data processing method for stroke records

Country Status (1)

Country Link
CN (1) CN117273013B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109800304A (en) * 2018-12-29 2019-05-24 北京奇安信科技有限公司 Processing method, device, equipment and the medium of case notes
CN110970022A (en) * 2019-10-14 2020-04-07 珠海格力电器股份有限公司 Terminal control method, device, equipment and readable medium
CN114564950A (en) * 2022-03-02 2022-05-31 东北电力大学 Electric Chinese named entity recognition method combining word sequence
CN115617991A (en) * 2022-10-10 2023-01-17 河南科技学院 Method for evaluating viewpoint quality in online collaborative learning based on machine learning
KR20230088093A (en) * 2021-12-10 2023-06-19 중앙대학교 산학협력단 Method of supporting fake news detection decision-making through the ambiguity evaluation of articles

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109800304A (en) * 2018-12-29 2019-05-24 北京奇安信科技有限公司 Processing method, device, equipment and the medium of case notes
CN110970022A (en) * 2019-10-14 2020-04-07 珠海格力电器股份有限公司 Terminal control method, device, equipment and readable medium
KR20230088093A (en) * 2021-12-10 2023-06-19 중앙대학교 산학협력단 Method of supporting fake news detection decision-making through the ambiguity evaluation of articles
CN114564950A (en) * 2022-03-02 2022-05-31 东北电力大学 Electric Chinese named entity recognition method combining word sequence
CN115617991A (en) * 2022-10-10 2023-01-17 河南科技学院 Method for evaluating viewpoint quality in online collaborative learning based on machine learning

Also Published As

Publication number Publication date
CN117273013A (en) 2023-12-22

Similar Documents

Publication Publication Date Title
CN110767218A (en) End-to-end speech recognition method, system, device and storage medium thereof
JP2019053126A (en) Growth type interactive device
CN110097096B (en) Text classification method based on TF-IDF matrix and capsule network
CN107943786B (en) Chinese named entity recognition method and system
CN111128128B (en) Voice keyword detection method based on complementary model scoring fusion
CN112417132B (en) New meaning identification method for screening negative samples by using guest information
CN111724766B (en) Language identification method, related equipment and readable storage medium
CN111180025A (en) Method and device for representing medical record text vector and inquiry system
CN110134950A (en) A kind of text auto-collation that words combines
CN115713072A (en) Relation category inference system and method based on prompt learning and context awareness
CN104317882A (en) Decision-based Chinese word segmentation and fusion method
CN116150651A (en) AI-based depth synthesis detection method and system
CN105931646A (en) Speaker identification method base on simple direct tolerance learning algorithm
CN111209373A (en) Sensitive text recognition method and device based on natural semantics
CN117273013B (en) Electronic data processing method for stroke records
CN113157918A (en) Commodity name short text classification method and system based on attention mechanism
CN112084944A (en) Method and system for identifying dynamically evolved expressions
CN112489689A (en) Cross-database voice emotion recognition method and device based on multi-scale difference confrontation
CN112951237B (en) Automatic voice recognition method and system based on artificial intelligence
CN113158669B (en) Method and system for identifying positive and negative comments of employment platform
CN113823326B (en) Method for using training sample of high-efficiency voice keyword detector
CN111860441B (en) Video target identification method based on unbiased depth migration learning
CN110717015B (en) Neural network-based polysemous word recognition method
CN111506764B (en) Audio data screening method, computer device and storage medium
CN108882033B (en) Character recognition method, device, equipment and medium based on video voice

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant