CN115082947B - Paper letter quick collecting, sorting and reading system - Google Patents

Paper letter quick collecting, sorting and reading system Download PDF

Info

Publication number
CN115082947B
CN115082947B CN202210822765.8A CN202210822765A CN115082947B CN 115082947 B CN115082947 B CN 115082947B CN 202210822765 A CN202210822765 A CN 202210822765A CN 115082947 B CN115082947 B CN 115082947B
Authority
CN
China
Prior art keywords
letter
attention
content
semantic
appeal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210822765.8A
Other languages
Chinese (zh)
Other versions
CN115082947A (en
Inventor
李振国
金雷
刘坤
王国清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Chuhuai Software Technology Development Co ltd
Original Assignee
Jiangsu Chuhuai Software Technology Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Chuhuai Software Technology Development Co ltd filed Critical Jiangsu Chuhuai Software Technology Development Co ltd
Priority to CN202210822765.8A priority Critical patent/CN115082947B/en
Publication of CN115082947A publication Critical patent/CN115082947A/en
Application granted granted Critical
Publication of CN115082947B publication Critical patent/CN115082947B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/42Document-oriented image-based pattern recognition based on the type of document
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/19007Matching; Proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/26Techniques for post-processing, e.g. correcting the recognition result
    • G06V30/262Techniques for post-processing, e.g. correcting the recognition result using context analysis, e.g. lexical, syntactic or semantic context

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Tourism & Hospitality (AREA)
  • Strategic Management (AREA)
  • Multimedia (AREA)
  • Human Resources & Organizations (AREA)
  • General Business, Economics & Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Marketing (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Primary Health Care (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a paper letter rapid collection, sorting and reading system, which relates to the technical field of letter content processing, and comprises a paper letter scanning processing module, a system input interface and a letter processing module, wherein the paper letter scanning processing module is used for separately scanning and identifying each part of letter material by adopting a rapid scanner, automatically transferring the letter content obtained after scanning and identifying to the system input interface, and generating a letter list corresponding to each part of letter material; the content processing module is used for performing comparison and review and extracting elements; carrying out semantic decomposition to obtain a appeal part and a description fact part corresponding to each letter material; identifying the category of the affiliated public opinion hotspot; the attention index calculation module is used for calculating attention indexes for the letter materials; the important index calculation module is used for calculating important indexes for the letter materials; the pushing module is used for calculating the comprehensive pushing attention of each letter material; and obtaining a list of letters to be handled, which is pushed to the staff.

Description

Paper letter quick collecting, sorting and reading system
Technical Field
The invention relates to the technical field of letter content processing, in particular to a system for rapidly collecting, sorting and reading paper letters.
Background
The method for transmitting and expressing the comments in the form of letters is necessary, so that the importance of a delivery person can be intuitively reflected; however, for the manager handling letters, the number of letter materials delivered is often large and the category is cumbersome; because of the limited time and effort of the staff, the high cautious standard since the consistency of the letter handling business determines that the time to get the letter feedback may be longer when the delivery person delivers the letter material; the traditional letter material processing mode needs manual information acquisition and manual input, and is complex in process and easy to make mistakes; the time cost is high, related work needs to be completed manually by staff, and time and labor are wasted;
for letter processing business, when a large amount of letter materials are received in the same time period, the common appeal of wide delivery persons is quickly, efficiently and accurately identified, and the common appeal is submitted to related responsible personnel as much as possible, so that the key of whether the letter processing business is efficient in working quality is determined.
Disclosure of Invention
The invention aims to provide a system for rapidly collecting, sorting and reading paper letters so as to solve the problems in the background technology.
In order to solve the technical problems, the invention provides the following technical scheme: the system comprises a paper letter scanning processing module, a content processing module, a attention index calculating module, an important index calculating module and a pushing module;
the paper letter scanning processing module is used for separately scanning and identifying each letter material by adopting a rapid scanner, automatically transferring the letter content obtained after scanning and identifying to a system input interface, and generating a letter list corresponding to each letter material; the letter material comprises a plurality of paper letters; wherein, the categories of the paper letters comprise complaints, identity certificates and proxy agent certificates; the form of the paper letter comprises a handwriting body and a printing body; registering source information of each letter material; summarizing and displaying the source information and the material content obtained by scanning and identifying in a letter list; the letter list comprises a list two-dimensional code, and a list number is stored in the list two-dimensional code;
the content processing module is used for comparing and reviewing the letter content obtained after each scanning with the corresponding material original, automatically entering a system to-be-transacted interface for element extraction according to the letter content setting after the comparison and the review are correct; setting a handling period, and respectively carrying out semantic decomposition on each profile information part presented in the profile information element columns in the handling period to obtain a appeal part and a fact description part corresponding to each letter material; based on semantic character features of a appeal part and a description fact part corresponding to each letter material, identifying the affiliated civil public opinion hotspot category of each letter material;
the attention index calculation module is used for extracting the contents of each letter material presented in the to-be-handled interface in the handling period and calculating attention indexes for each letter material based on the characteristic word distribution situation in the appeal part in each letter material content;
the important index calculation module is used for extracting the contents of each letter material presented in the to-be-handled interface in the handling period and calculating an important index for each letter material based on the characteristic word distribution situation in the fact description part in each letter material content;
the pushing module is used for obtaining comprehensive pushing attention of each letter material according to the attention index and the importance index corresponding to each letter material and the generation time of the list two-dimensional code corresponding to each letter material; and based on the comprehensive pushing attention degree of each letter material, sorting all the letter materials in the to-be-handled interface to obtain a to-be-handled letter list pushed to the staff.
Further, the content processing module comprises an element extraction processing unit, a semantic decomposition unit and a hot spot identification unit;
the element extraction processing unit is used for extracting elements of the letter content automatically transferred into the system input interface and automatically filling the element content correspondingly extracted into the element column; the element columns corresponding to the element columns comprise letter writer information, profile information, problem areas and system departments to which the problems belong; respectively comparing and checking the content in each element column with the original one by one, and setting the letter content after the comparison and checking are error-free to automatically enter a system to-be-handled interface;
the semantic decomposition unit is used for real-time processing of civil news public opinion from Internet including mainstream news websites and new media websitesCapturing data, carrying out semantic decomposition on the folk news public opinion data, and respectively extracting keyword sets X corresponding to a plurality of folk news public opinion hotspots 1 ,X 2 ,…,X n The method comprises the steps of carrying out a first treatment on the surface of the Wherein X is 1 ,X 2 ,…,X n Respectively representing keyword sets corresponding to class 1, class 2, class … and class n folk public opinion hotspots; respectively carrying out semantic decomposition on each profile information part presented in the profile information element column in the handling period to obtain a appeal part and a description fact part corresponding to each letter material; extracting keywords from the solicited part and the declared part of each letter material to obtain a keyword set Y corresponding to each letter material 1 ,Y 2 ,…,Y m The method comprises the steps of carrying out a first treatment on the surface of the Wherein Y is 1 、Y 2 、…、Y m Respectively representing the extracted keyword sets corresponding to the 1 st, 2 nd, … th and m th letter materials;
the hot spot identification unit is used for respectively calculating the similarity between the keyword set of each letter material and the keyword sets of a plurality of types of civil public opinion hot spots successively, setting a similarity threshold, respectively selecting the types of the civil public opinion hot spots with the overlap ratio larger than the overlap ratio threshold for each material, carrying out category marking treatment to obtain the corresponding types of the civil public opinion hot spots of each letter material, and respectively accumulating category marking numbers for each letter material;
the recognition of the civil public opinion hotspot categories of each letter material is performed respectively, so that staff can quickly master which hotspots are mainly related to the appeal reacted by the letter submitters in a certain period, the staff can provide necessary technical mats on the follow-up master of the appeal shared by the wide submitters, and meanwhile, the staff can provide necessary technical mats for the follow-up calculation of the relevant attention degree of the materials provided by each submitter.
Further, the attention index calculation module comprises a labeling area identification processing unit, a first attention index calculation unit, a second attention index calculation unit and a third attention index calculation unit;
the marking area identification processing unit is used for marking areas of scanned letter contents corresponding to various letter materials and completing the integration processing of the marking areas based on the distribution characteristics of the marking areas;
the first attention index calculation unit is used for receiving the data in the labeling area identification processing unit and calculating a first attention index for each letter material;
a second attention index calculation unit for receiving the data in the labeling area identification processing unit and calculating a second attention index for each letter material
And the third attention index calculation unit is used for receiving the data in the labeling area identification processing unit and calculating a third attention index for each letter material.
Further, the labeling area identifying and processing unit includes:
capturing all the dullness characteristic words or phrases and sensibility characteristic words or phrases in the big data in advance, simultaneously, collecting all the words or phrases with the feature of the claim, the sensitive words or phrases into a feature word stock; setting the degree grade number of each feature word or phrase in the feature word library;
respectively acquiring the complaint part text content typesetting obtained after the letter materials are scanned, respectively carrying out content investigation on the materials classified into various folk public opinion hotspots, and displaying the dialect feature words or phrases and the sensitivity feature words or phrase labels appearing in the complaint parts of the materials on the complaint part text content typesetting based on the feature word stock; one labeling word or phrase corresponds to one first labeling area;
capturing the line interval word number C between each first labeling area, setting an interval word number threshold value, and labeling a non-labeling word part between two adjacent first labeling areas if the line interval word number C between the two adjacent first labeling areas is smaller than the interval word number threshold value, so as to generate a second labeling area formed by converging the two adjacent first labeling areas and the interval non-labeling area;
the process of carrying out region labeling processing on the appeal part obtained after each letter material is characterized in that psychological urgency degree of a submitter when related requests and complaints are stated is identified, namely, the emotion instability index of the letter submitter is defined by capturing the duty ratio of some sensitive words and dullness words in the whole space, so that staff can grasp the situation, and when staff is reminded of carrying out feedback processing on the materials, the staff can pay important attention to or pay priority to the processing under the condition that other letter material processing is not influenced.
Further, the first attention index calculation unit includes:
receiving first labeling area information and second labeling area information in a labeling area identification processing unit;
calculating a first Attention index for each letter material 1
Wherein Ya is i A text character length representing an i-th first labeling area in each letter material appeal section; ya j A text character length representing a j-th second labeling area in each letter material appeal section; a represents the total length of text in each letter material claim section.
Further, the second attention index calculation unit includes:
receiving first labeling area information and second labeling area information in a labeling area identification processing unit;
calculating a second Attention index for each letter material 2
Attention 2 =∑Dgreeea i +∑avDgreeea j
Wherein Dgreeeea i Representing the number of degree grades corresponding to the ith first labeling area in each letter material appeal part; avDgreeeea j Representing the number of average degrees of ranking corresponding to the j-th second marked area in each letter material appeal section.
Further, the third attention index calculation unit includes:
acquiring text typesetting of each letter material before scanning and recognition, and capturing the front, inner and rear characteristic symbol formats of each first labeling area and each second labeling area corresponding to the appeal part in the text typesetting; marking and highlighting the parts with the characteristic symbol formats one by one in the content of the appeal part obtained after scanning and identifying; wherein the feature symbol format includes exclamation marks, question marks, fonts different from adjacent text words, font sizes different from adjacent text words, font colors different from adjacent text words, underlining, bold, highlighting;
calculating a third Attention index for each material 3 :Attention 3 =∑(R 1 a i *R 2 a i )+∑(R 1 a j *R 2 a j ) Wherein R is 1 a i Representing the number of types of signature formats that appear before, within, and after the ith first labeling area in each letter material appeal section; r is R 2 a i Representing the total number of character symbol formats which appear in front of, in the interior of and behind the ith first labeling area in each letter material appeal part; r is R 1 a j Representing the number of types of signature formats that appear before, within, and after the jth second labeling area in each letter material appeal section; r is R 2 a j The total number of signature formats that appear before, within, and after the jth second label area in each material claim section is represented.
Further, the importance index calculation module includes:
extracting the fact content parts of the letter materials in the same class respectively; respectively identifying, disassembling and extracting semantic elements of each declared fact content part; the semantic elements comprise event occurrence time, event related characters, event occurrence places, event main contradictions, event backgrounds and event passes; obtaining a semantic element set corresponding to each letter material;
respectively acquiring the similarity between two semantic element sets in the same class of material, setting a similarity threshold, and collecting semantic element sets with the similarity threshold being larger than the similarity threshold to respectively acquire a plurality of semantic element set centers, wherein one semantic element set center comprises a plurality of semantic element sets with the similarity being larger than the similarity threshold;
classifying each letter material in the same class of materials based on the corresponding semantic element concentration centers; calculating an important index for each semantic element set center to which each material belongs:
wherein, import e An important index representing the center of the e-th semantic element set; m is M e Representing the average similarity value between the semantic element sets in the e-th semantic element set center; k (K) e Representing the total number of the semantic element sets in the e-th semantic element set center;
attaching each semantic element set with an important index value import of a corresponding semantic element set center;
in the above process, through the process of classifying the semantic element set center, the number of complaints submitted by the letters submitted by persons based on the same complaints or the same complaint facts in the same class of materials is grasped, if the important index value import corresponding to one semantic element set is larger, the scope of the careable submitted person is larger, and on the other layer, the attention is more on the corresponding complaint content problem if the staff is preferentially examined for the class of materials in the subsequent processing process.
Further, the pushing module includes:
acquiring first Attention indexes Attention corresponding to various letters of different folk public opinion hotspot categories 1 Second Attention index Attention 2 Third Attention index Attention 3 Important index import; ordering the mail materials belonging to different folk public opinion hotspot categories according to the generation time of the corresponding list two-dimensional codes to obtain time ordering serial numbers corresponding to the materials;
calculating comprehensive push attention degree for each material:
F=Attention 1 +Attention 2 +Attention 3 *import st
wherein F represents comprehensive push attention, st represents time sequence numbers corresponding to the materials;
sequencing all the letter materials in all the letter materials belonging to different folk public opinion hotspot categories according to the comprehensive push attention degree from large to small to obtain a list number sequence set belonging to different folk public opinion hotspot categories; and pushing the materials to be transacted to the staff according to the list number ordering in the list number sequence set.
Compared with the prior art, the invention has the following beneficial effects: the invention fully exerts the precision and predictability advantages of modern technology by means of artificial intelligence, and further improves the working quality; the modern technology is scientifically applied to realize manual replacement, so that the working efficiency is further improved; based on technologies such as image recognition, natural language processing and the like, the core capability and model algorithm of artificial intelligence such as voice recognition, OCR recognition, key element extraction, automatic generation of item profile and the like are developed in a customized mode, the application of the artificial intelligence on two layers of auxiliary item handling and letter submitting service is deepened, the handling period is further shortened, and the handling precision and standardization level are improved; in the process of processing the letter materials, the invention carries out the operation of the related indexes on the letter materials provided by each letter submitter, thereby providing the assistance of business processing for staff and comprehensively improving the letter processing working quality.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:
FIG. 1 is a schematic diagram of a system for rapid collection, sorting and reading of paper letters according to the present invention;
FIG. 2 is a schematic flow chart of a method in the system for rapid collection, sorting and reading of paper letters according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1-2, the present invention provides the following technical solutions: the system comprises a paper letter scanning processing module, a content processing module, a attention index calculating module, an important index calculating module and a pushing module;
the paper letter scanning processing module is used for separately scanning and identifying each letter material by adopting a rapid scanner, automatically transferring the letter content obtained after scanning and identifying to a system input interface, and generating a letter list corresponding to each letter material; the letter material comprises a plurality of paper letters; wherein, the categories of the paper letters comprise complaints, identity certificates and proxy agent certificates; the form of the paper letter comprises a handwriting body and a printing body; registering source information of each letter material; summarizing and displaying the source information and the material content obtained by scanning and identifying in a letter list; the letter list comprises a list two-dimensional code, and a list number is stored in the list two-dimensional code;
the content processing module is used for comparing and reviewing the letter content obtained after each scanning with the corresponding material original, automatically entering a system to-be-transacted interface for element extraction according to the letter content setting after the comparison and the review are correct; setting a handling period, and respectively carrying out semantic decomposition on each profile information part presented in the profile information element columns in the handling period to obtain a appeal part and a fact description part corresponding to each letter material; based on semantic character features of a appeal part and a description fact part corresponding to each letter material, identifying the affiliated civil public opinion hotspot category of each letter material;
the content processing module comprises an element extraction processing unit, a semantic decomposition unit and a hot spot identification unit;
the element extraction processing unit is used for extracting elements of the letter content automatically transferred into the system input interface and automatically filling the element content correspondingly extracted into the element column; the element columns corresponding to the element columns comprise letter writer information, profile information, problem areas and system departments to which the problems belong; respectively comparing and checking the content in each element column with the original one by one, and setting the letter content after the comparison and checking are error-free to automatically enter a system to-be-handled interface;
the semantic decomposition unit is used for capturing the public opinion data of the civil news including the main stream news website and the new media website from the Internet in real time, carrying out semantic decomposition on the public opinion data of the civil news, and respectively extracting keyword sets X corresponding to a plurality of types of public opinion hotspots 1 ,X 2 ,…,X n The method comprises the steps of carrying out a first treatment on the surface of the Wherein X is 1 ,X 2 ,…,X n Respectively representing keyword sets corresponding to class 1, class 2, class … and class n folk public opinion hotspots; respectively carrying out semantic decomposition on each profile information part presented in the profile information element column in the handling period to obtain a appeal part and a description fact part corresponding to each letter material; extracting keywords from the solicited part and the declared part of each letter material to obtain a keyword set Y corresponding to each letter material 1 ,Y 2 ,…,Y m The method comprises the steps of carrying out a first treatment on the surface of the Wherein Y is 1 、Y 2 、…、Y m Respectively representing the extracted keyword sets corresponding to the 1 st, 2 nd, … th and m th letter materials;
the hot spot identification unit is used for respectively calculating the similarity between the keyword set of each letter material and the keyword sets of a plurality of types of civil public opinion hot spots successively, setting a similarity threshold, respectively selecting the types of the civil public opinion hot spots with the overlap ratio larger than the overlap ratio threshold for each material, carrying out category marking treatment to obtain the corresponding types of the civil public opinion hot spots of each letter material, and respectively accumulating category marking numbers for each letter material;
the attention index calculation module is used for extracting the contents of each letter material presented in the to-be-handled interface in the handling period and calculating attention indexes for each letter material based on the characteristic word distribution situation in the appeal part in each letter material content;
the attention index calculating module comprises a labeling area identification processing unit, a first attention index calculating unit, a second attention index calculating unit and a third attention index calculating unit;
the marking area identification processing unit is used for marking areas of scanned letter contents corresponding to various letter materials and completing the integration processing of the marking areas based on the distribution characteristics of the marking areas;
the labeling area identification processing unit comprises:
capturing all the dullness characteristic words or phrases and sensibility characteristic words or phrases in the big data in advance, simultaneously, collecting all the words or phrases with the feature of the claim, the sensitive words or phrases into a feature word stock; setting the degree grade number of each feature word or phrase in the feature word library;
respectively acquiring the complaint part text content typesetting obtained after the letter materials are scanned, respectively carrying out content investigation on the materials classified into various folk public opinion hotspots, and displaying the dialect feature words or phrases and the sensitivity feature words or phrase labels appearing in the complaint parts of the materials on the complaint part text content typesetting based on the feature word stock; one labeling word or phrase corresponds to one first labeling area;
capturing the line interval word number C between each first labeling area, setting an interval word number threshold value, and labeling a non-labeling word part between two adjacent first labeling areas if the line interval word number C between the two adjacent first labeling areas is smaller than the interval word number threshold value, so as to generate a second labeling area formed by converging the two adjacent first labeling areas and the interval non-labeling area;
the first attention index calculation unit is used for receiving the data in the labeling area identification processing unit and calculating a first attention index for each letter material;
wherein the first attention index calculation unit includes:
receiving first labeling area information and second labeling area information in a labeling area identification processing unit;
calculating a first Attention index for each letter material 1
Wherein Ya is i A text character length representing an i-th first labeling area in each letter material appeal section; ya j A text character length representing a j-th second labeling area in each letter material appeal section; a represents the total length of text in each letter material claim section;
the second attention index calculation unit is used for receiving the data in the labeling area identification processing unit and calculating a second attention index for each letter material;
wherein the second attention index calculation unit includes:
receiving first labeling area information and second labeling area information in a labeling area identification processing unit;
calculating a second Attention index for each letter material 2
Attention 2 =∑Dgreeea i +∑avDgreeea j
Wherein Dgreeeea i Representing the number of degree grades corresponding to the ith first labeling area in each letter material appeal part; avDgreeeea j Representing the average degree grade number corresponding to the j second labeling area in each letter material appeal part;
a third attention index calculation unit for receiving the data in the labeling area identification processing unit and calculating a third attention index for each letter material;
wherein the third attention index calculation unit includes:
acquiring text typesetting of each letter material before scanning and recognition, and capturing the front, inner and rear characteristic symbol formats of each first labeling area and each second labeling area corresponding to the appeal part in the text typesetting; marking and highlighting the parts with the characteristic symbol formats one by one in the content of the appeal part obtained after scanning and identifying; wherein the feature symbol format includes exclamation marks, question marks, fonts different from adjacent text words, font sizes different from adjacent text words, font colors different from adjacent text words, underlining, bold, highlighting;
calculating a third Attention index for each material 3 :Attention 3 =∑(R 1 a i *R 2 a i )+∑(R 1 a j *R 2 a j ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein R is 1 a i Representing the number of types of signature formats that appear before, within, and after the ith first labeling area in each letter material appeal section; r is R 2 a i Representing the total number of character symbol formats which appear in front of, in the interior of and behind the ith first labeling area in each letter material appeal part; r is R 1 a j Representing the number of types of signature formats that appear before, within, and after the jth second labeling area in each letter material appeal section; r is R 2 a j Representing the total number of feature symbol formats appearing before, inside and behind the jth second labeling area in each material appeal part;
the important index calculation module is used for extracting the contents of each letter material presented in the to-be-handled interface in the handling period and calculating an important index for each letter material based on the characteristic word distribution situation in the fact description part in each letter material content;
wherein, the importance index calculation module includes:
extracting the fact content parts of the letter materials in the same class respectively; respectively identifying, disassembling and extracting semantic elements of each declared fact content part; the semantic elements comprise event occurrence time, event related characters, event occurrence places, event main contradictions, event backgrounds and event passes; obtaining a semantic element set corresponding to each letter material;
respectively acquiring the similarity between two semantic element sets in the same class of material, setting a similarity threshold, and collecting semantic element sets with the similarity threshold being larger than the similarity threshold to respectively acquire a plurality of semantic element set centers, wherein one semantic element set center comprises a plurality of semantic element sets with the similarity being larger than the similarity threshold;
classifying each letter material in the same class of materials based on the corresponding semantic element concentration centers; calculating an important index for each semantic element set center to which each material belongs:
wherein, import e An important index representing the center of the e-th semantic element set; m is M e Representing the average similarity value between the semantic element sets in the e-th semantic element set center; k (K) e Representing the total number of the semantic element sets in the e-th semantic element set center;
attaching each semantic element set with an important index value import of a corresponding semantic element set center;
the pushing module is used for obtaining comprehensive pushing attention of each letter material according to the attention index and the importance index corresponding to each letter material and the generation time of the list two-dimensional code corresponding to each letter material; based on the comprehensive pushing attention degree of each letter material, all the letter materials in the to-be-handled interface are arranged to obtain a to-be-handled letter list pushed to a worker;
wherein, the push module includes:
acquiring first Attention indexes Attention corresponding to various letters of different folk public opinion hotspot categories 1 Second Attention index Attention 2 Third Attention index Attention 3 Important index import; ordering the mail materials belonging to different folk public opinion hotspot categories according to the generation time of the corresponding list two-dimensional codes to obtain time ordering serial numbers corresponding to the materials;
calculating comprehensive push attention degree for each material:
F=Attention 1 +Attention 2 +Attention 3 *import st
wherein F represents comprehensive push attention, st represents time sequence numbers corresponding to the materials;
sequencing all the letter materials in all the letter materials belonging to different folk public opinion hotspot categories according to the comprehensive push attention degree from large to small to obtain a list number sequence set belonging to different folk public opinion hotspot categories; and pushing the materials to be transacted to the staff according to the list number ordering in the list number sequence set.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (3)

1. The utility model provides a letter system is read in letter sorting of gathering fast to paper mail which characterized in that, the system includes: the system comprises a paper letter scanning processing module, a content processing module, a focus index calculating module, an important index calculating module and a pushing module;
the paper letter scanning processing module is used for separately scanning and identifying each letter material by adopting a rapid scanner, automatically transferring the letter content obtained after scanning and identifying to a system input interface, and generating a letter list corresponding to each letter material; the letter material comprises a plurality of paper letters; wherein, the categories of the paper letters comprise complaints, identity certificates and proxy agent certificates; the form of the paper letter comprises a handwriting body and a printing body; registering source information of each letter material; summarizing and displaying the source information and the scanned and identified material content in the letter list; the letter list comprises a list two-dimensional code, and a list number is stored in the list two-dimensional code;
the content processing module is used for comparing and reviewing the letter content obtained after each scanning with the corresponding material original, automatically entering a system to-be-processed interface for element extraction according to letter content setting after the comparison and the review are correct; setting a handling period, and respectively carrying out semantic decomposition on each profile information part presented in the profile information element columns in the handling period to obtain a appeal part and a fact description part corresponding to each letter material; based on semantic character features of a appeal part and a description fact part corresponding to each letter material, identifying the affiliated civil public opinion hotspot category of each letter material;
the attention index calculation module is used for extracting the content of each letter material presented in the to-be-handled interface in the handling period, and calculating the attention index for each letter material based on the characteristic word distribution condition in the appeal part in each letter material content;
the attention index calculation module comprises a labeling area identification processing unit, a first attention index calculation unit, a second attention index calculation unit and a third attention index calculation unit;
the marking area identification processing unit is used for marking areas of scanned letter contents corresponding to various letter materials and completing the integration processing of the marking areas based on the distribution characteristics of the marking areas;
the labeling area identification processing unit comprises:
capturing all the dullness characteristic words or phrases and sensibility characteristic words or phrases in the big data in advance, simultaneously, collecting all the words or phrases with the feature of the claim, the sensitive words or phrases into a feature word stock; setting the degree grade number of each feature word or phrase in the feature word library respectively;
respectively obtaining the typesetting of the text content of the appeal part obtained after the scanning of each letter material, respectively carrying out content investigation on the materials classified into various folk public opinion hotspots, and displaying the dialect feature words or phrases, the sensitivity feature words or phrase labels appearing in each material appeal part on the typesetting of the text content of the appeal part based on the feature word stock; one labeling word or phrase corresponds to one first labeling area;
capturing the line interval word number C between each first labeling area, setting an interval word number threshold, labeling the non-labeling word part between two adjacent first labeling areas if the line interval word number C between the two adjacent first labeling areas is smaller than the interval word number threshold, and generating a second labeling area formed by converging the two adjacent first labeling areas and the interval non-labeling area
The first attention index calculation unit is used for receiving the data in the labeling area identification processing unit and calculating a first attention index for each letter material;
the first attention index calculation unit includes: receiving first labeling area information and second labeling area information in the labeling area identification processing unit; calculating a first Attention index for each letter material 1
Wherein Ya is i A text character length representing an i-th first labeling area in each letter material appeal section; ya j Representing the jth second labeling zone in each letter material appeal sectionThe text character length of the field; a represents the total length of text in each letter material claim section;
the second attention index calculation unit is used for receiving the data in the labeling area identification processing unit and calculating a second attention index for each letter material;
the second attention index calculation unit includes: receiving first labeling area information and second labeling area information in the labeling area identification processing unit; calculating a second Attention index for each letter material 2
Attention 2 =ΣDgreee(a i )+ΣavDgreee(a j )
Wherein Dgreee (a) i ) Representing the number of degree grades corresponding to the ith first labeling area in each letter material appeal part; avDgreee (a) j ) Representing the average degree grade number corresponding to the j second labeling area in each letter material appeal part;
the third attention index calculation unit is used for receiving the data in the labeling area identification processing unit and calculating a third attention index for each letter material;
the third attention index calculation unit includes:
acquiring text typesetting of each letter material before scanning and recognition, and capturing the front, the inner and the rear characteristic symbol formats of each first marking area and each second marking area of the corresponding appeal part in the text typesetting; marking and highlighting the parts with the characteristic symbol formats one by one in the content of the appeal part obtained after scanning and identifying; wherein the characteristic symbol format comprises exclamation marks, question marks, fonts different from adjacent text words, font sizes different from adjacent text words, font colors different from adjacent text words, underlining, bold, highlighting;
calculating a third Attention index for each of the materials 3 :Attention 3 =∑(R 1 a i *R 2 a i )+∑(R 1 a j *R 2 a j ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein R is 1 a i Indicated at each ofThe letter material appeal part is characterized by the number of types of character symbol formats appearing in front of, in the interior of and behind the ith first labeling area; r is R 2 a i Representing the total number of character symbol formats which appear in front of, in the interior of and behind the ith first labeling area in each letter material appeal part; r is R 1 a j Representing the number of types of signature formats that appear before, within, and after the jth second labeling area in each letter material appeal section; r is R 2 a j Representing the total number of signature formats appearing before, within and after the jth second labeling area in each of the material appeal parts;
the important index calculation module is used for extracting the contents of each letter material presented in the to-be-handled interface in the handling period, and calculating the important index for each letter material based on the characteristic character distribution condition in the fact description part in each letter material content;
the pushing module is used for obtaining comprehensive pushing attention of each letter material according to the attention index and the important index corresponding to each letter material and the generation time of the list two-dimensional code corresponding to each letter material; based on the comprehensive pushing attention degree of each letter material, all the letter materials in the to-be-handled interface are arranged to obtain a to-be-handled letter list pushed to a worker;
the pushing module comprises:
acquiring first Attention indexes Attention corresponding to various letters of different folk public opinion hotspot categories 1 Second Attention index Attention 2 Third Attention index Attention 3 Important index import; ordering the mail materials belonging to different folk public opinion hotspot categories according to the generation time of the corresponding list two-dimensional codes to obtain time ordering serial numbers corresponding to the materials;
calculating comprehensive push attention degree for each material:
F=[(Attention 1 +Attention 2 +Attention 3 )*import] st
wherein F represents comprehensive push attention, st represents time sequence numbers corresponding to the materials; sequencing all the letter materials in all the letter materials belonging to different folk public opinion hotspot categories according to the comprehensive push attention degree from large to small to obtain a list number sequence set belonging to different folk public opinion hotspot categories; and pushing the materials to be transacted to the staff according to the list number ordering in the list number sequence set.
2. The rapid paper letter collecting, sorting and reading system according to claim 1, wherein the content processing module comprises an element extraction processing unit, a semantic decomposition unit and a hot spot identification unit;
the element extraction processing unit is used for extracting elements of the letter content automatically transferred into the system input interface and automatically filling the corresponding extracted element content into the corresponding element column; the element columns corresponding to the element columns comprise letter writer information, profile information, problem areas and system departments to which the problems belong; respectively comparing and checking the content in each element column with the original one by one, and setting the letter content after the comparison and checking are error-free to automatically enter a system to-be-handled interface;
the semantic decomposition unit is used for capturing the civil news public opinion data from the internet end including the main stream news website and the new media website in real time, carrying out semantic decomposition on the civil news public opinion data, and respectively extracting keyword sets { (X) corresponding to a plurality of types of civil public opinion hotspots 1 ),(X 2 ),…,(X n ) -a }; wherein, (X 1 ),(X 2 ),…,(X n ) Respectively representing keyword sets corresponding to class 1, class 2, class … and class n folk public opinion hotspots; respectively carrying out semantic decomposition on each profile information part presented in the profile information element column in the handling period to obtain a appeal part and a description fact part corresponding to each letter material; extracting keywords from the appeal part and the description fact part of each letter material to obtain keyword set { (Y) corresponding to each letter material 1 ),(Y 2 ),…,(Y m ) -a }; wherein, (Y) 1 )、(Y 2 )、…、(Y m ) Respectively representing the extracted keyword sets corresponding to the 1 st, 2 nd, … th and m th letter materials;
the hot spot identification unit is used for respectively calculating the similarity between the keyword sets of the letter materials and the keyword sets of the plurality of types of public opinion hot spots successively, setting a similarity threshold, respectively selecting the public opinion hot spot categories with the overlap ratio larger than the overlap ratio threshold for each material, carrying out category marking processing on the public opinion hot spot categories with the overlap ratio larger than the overlap ratio threshold to obtain the public opinion hot spot categories corresponding to the respective categories of the letter materials, and respectively accumulating category marking numbers for each letter material.
3. The rapid paper letter sorting system according to claim 1, wherein the importance index calculation module includes:
extracting the fact content parts of the letter materials in the same class respectively; respectively identifying, disassembling and extracting semantic elements of each declared fact content part; the semantic elements comprise event occurrence time, event related characters, event occurrence places, event main contradictions, event backgrounds and event passes; obtaining a semantic element set corresponding to each letter material;
respectively obtaining the similarity between two semantic element sets in the same class of material, setting a similarity threshold, and collecting semantic element sets larger than the similarity threshold to respectively obtain a plurality of semantic element set centers, wherein one semantic element set center contains a plurality of semantic element sets with similarity larger than the similarity threshold;
classifying each letter material in the same class of materials based on the corresponding semantic element concentration centers; calculating an important index for each semantic element set center to which each material belongs:
wherein, import e Representing the e-th semantic meaningImportant index of prime center; m is M e Representing the average similarity value between the semantic element sets in the e-th semantic element set center; k (K) e Representing the total number of the semantic element sets in the e-th semantic element set center;
and respectively attaching each semantic element set with an important index value import of a corresponding semantic element set center.
CN202210822765.8A 2022-07-12 2022-07-12 Paper letter quick collecting, sorting and reading system Active CN115082947B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210822765.8A CN115082947B (en) 2022-07-12 2022-07-12 Paper letter quick collecting, sorting and reading system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210822765.8A CN115082947B (en) 2022-07-12 2022-07-12 Paper letter quick collecting, sorting and reading system

Publications (2)

Publication Number Publication Date
CN115082947A CN115082947A (en) 2022-09-20
CN115082947B true CN115082947B (en) 2023-08-15

Family

ID=83259712

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210822765.8A Active CN115082947B (en) 2022-07-12 2022-07-12 Paper letter quick collecting, sorting and reading system

Country Status (1)

Country Link
CN (1) CN115082947B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544255A (en) * 2013-10-15 2014-01-29 常州大学 Text semantic relativity based network public opinion information analysis method
CN111914570A (en) * 2020-04-02 2020-11-10 菏泽学院 Entity representation method integrating multiple element analysis
CN112766359A (en) * 2021-01-14 2021-05-07 北京工商大学 Word double-dimensional microblog rumor recognition method for food safety public sentiment
CN113887219A (en) * 2021-08-12 2022-01-04 南京汇宁桀信息科技有限公司 Hot line public opinion identification and early warning method and system for competent department

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10628668B2 (en) * 2017-08-09 2020-04-21 Open Text Sa Ulc Systems and methods for generating and using semantic images in deep learning for classification and data extraction

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544255A (en) * 2013-10-15 2014-01-29 常州大学 Text semantic relativity based network public opinion information analysis method
CN111914570A (en) * 2020-04-02 2020-11-10 菏泽学院 Entity representation method integrating multiple element analysis
CN112766359A (en) * 2021-01-14 2021-05-07 北京工商大学 Word double-dimensional microblog rumor recognition method for food safety public sentiment
CN113887219A (en) * 2021-08-12 2022-01-04 南京汇宁桀信息科技有限公司 Hot line public opinion identification and early warning method and system for competent department

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于数据挖掘技术的舆情监控系统的设计与实现;刘峰;《中国优秀硕士学位论文全文数据库 信息科技辑》(第3期);全文 *

Also Published As

Publication number Publication date
CN115082947A (en) 2022-09-20

Similar Documents

Publication Publication Date Title
CN108829681B (en) Named entity extraction method and device
CN112631997B (en) Data processing method, device, terminal and storage medium
US20050165642A1 (en) Method and system for processing classified advertisements
US11232300B2 (en) System and method for automatic detection and verification of optical character recognition data
CN112541077A (en) Processing method and system for power grid user service evaluation
CN111428480A (en) Resume identification method, device, equipment and storage medium
CN111462752A (en) Client intention identification method based on attention mechanism, feature embedding and BI-L STM
CN116304023A (en) Method, system and storage medium for extracting bidding elements based on NLP technology
CN114239579A (en) Electric power searchable document extraction method and device based on regular expression and CRF model
CN115082947B (en) Paper letter quick collecting, sorting and reading system
CN116823422A (en) Form data processing method and device
CN116127105A (en) Data collection method and device for big data platform
CN116340387A (en) Statistical analysis method and system for personal information disclosure condition of data table
EP1361524A1 (en) Method and system for processing classified advertisements
CN112183035A (en) Text labeling method, device and equipment and readable storage medium
CN112818005A (en) Structured data searching method, device, equipment and storage medium
CN112686540A (en) Information processing method and device based on information demand
Shakhmametova et al. Recognition of text information in the bronchopulmonary diseases diagnosis system
US20240054281A1 (en) Document processing
CN113742444B (en) Text labeling method, text labeling device, storage medium and computer equipment
Horowitz Improving Computational Usability of Unstructured Pilot Medical Certification Data
CN112989786A (en) Document analysis method, system, device and storage medium based on image recognition
Panda et al. HATCAS: A Handwritten Textual Content Analysis System
Ou et al. AI Prescription Recognition System
CN115408995A (en) Structured analysis method and system for project electronic document

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant