CN109271489A - A kind of Method for text detection and device - Google Patents
A kind of Method for text detection and device Download PDFInfo
- Publication number
- CN109271489A CN109271489A CN201811247368.2A CN201811247368A CN109271489A CN 109271489 A CN109271489 A CN 109271489A CN 201811247368 A CN201811247368 A CN 201811247368A CN 109271489 A CN109271489 A CN 109271489A
- Authority
- CN
- China
- Prior art keywords
- detected
- text
- paragraph
- occurrence word
- object phrase
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/247—Thesauruses; Synonyms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of Method for text detection and devices, are related to text detection techniques field, main purpose is in the accuracy for improving text detection.The main technical solution of the present invention are as follows: obtain text to be detected;Determine the co-occurrence word of object phrase;Paragraph to be detected is determined in the text to be detected using the co-occurrence word;It whether is detected in the paragraph to be detected using machine learning model containing the content to match with the object phrase;Determine whether the text to be detected meets the requirements according to matching result and the corresponding detected rule of the object phrase.The present invention is used to detect the normalization of text.
Description
Technical field
The present invention relates to text detection techniques field more particularly to a kind of Method for text detection and device.
Background technique
With the emergence and development of internet, the quantity of text documents on the internet develops swifter and more violent, text detection
Technology has been widely used in the fields such as document retrieval, public sentiment management, by whether containing the short of user setting in detection text
Whether the information such as language or sentence determine in text content containing data content needed for user.
Currently, common text detection is mostly that the keyword based on user setting is matched, or utilizes nature language
Speech processing technique calculates the near synonym of multiple keywords, and carries out text detection using these words.However, in many situations
Under, especially in the case where polysemy, the language of the keyword using existing detection mode due to being difficult to determine user setting
Justice, it is even more impossible to be searched and matched by simple near synonym, therefore, the testing result obtained by existing detection mode according to
So there are problems that accuracy is low, narrow application range.
Summary of the invention
In view of the above problems, the invention proposes a kind of Method for text detection and device, main purpose is to improve text
The accuracy of detection.
In order to achieve the above objectives, present invention generally provides following technical solutions:
On the one hand, the present invention provides a kind of Method for text detection, specifically includes:
Obtain text to be detected;
Determine the co-occurrence word of object phrase;
Paragraph to be detected is determined in the text to be detected using the co-occurrence word;
It is detected in the paragraph to be detected in whether containing matching with the object phrase using machine learning model
Hold;
Determine whether the text to be detected conforms to according to matching result and the corresponding detected rule of the object phrase
It asks.
Preferably, determine that paragraph to be detected includes: in the text to be detected using the co-occurrence word
For each co-occurrence word of the object phrase, the co-occurrence word is searched in the text to be detected;
The co-occurrence word is often found, is determined according to position of the co-occurrence word in text to be detected and default bout length
One paragraph to be detected;
If the object phrase is without co-occurrence word, alternatively, the object phrase has in co-occurrence word but the text to be detected not
There are the co-occurrence words, then entire text to be detected are determined as paragraph to be detected.
Preferably, the position according to the co-occurrence word in text to be detected and default bout length determine one to
Detection paragraph includes:
In text to be detected, the text of preset length is respectively taken before and after the co-occurrence word found, obtain one it is to be checked
Survey paragraph;
Alternatively,
In text to be detected, length is taken to be equal to preset length and include the paragraph of the co-occurrence word found as to be checked
Survey paragraph.
Preferably, whether described detected in the text fragment to be detected using machine learning model is contained and the target
The content that phrase matches includes:
Utilize the term vector of phrase in paragraph to be detected described in term vector model inspection and the term vector of the object phrase
Similarity whether reach preset similarity value;Wherein, the term vector model is obtained based on neural metwork training for knowing
The model of other synonym;
If reaching, it is determined that contain the content to match with the object phrase in the text to be detected.
Preferably, the co-occurrence word of the determining object phrase includes:
Obtain corresponding co-occurrence word from preset co-occurrence vocabulary according to object phrase, the preset co-occurrence vocabulary be based on
What the text collection in text same technique to be detected field counted.
Preferably, the method also includes:
When needing to identify the corresponding name entity of the object phrase, call Named Entity Extraction Model judgement described to
It detects and whether there is name entity name corresponding with the object phrase in text;
If it is present the text to be detected meets the requirements;
If it does not exist, then the text to be detected is undesirable.
Preferably, the acquisition text to be detected includes:
Text will be converted to quality inspection voice, and obtain text to be detected.
On the other hand, the present invention provides a kind of text detection device, specifically includes:
Text acquiring unit, for obtaining text to be detected;
Co-occurrence word determination unit, for determining the co-occurrence word of object phrase;
Paragraph determination unit, the co-occurrence word for being obtained using the co-occurrence word determination unit is in the text acquiring unit
Paragraph to be detected is determined in obtained text to be detected;
Content matching unit, for detecting the paragraph to be detected that the paragraph determination unit determines using machine learning model
In whether contain the content that matches with the object phrase;
Text detection unit, the matching result and the object phrase for being obtained according to the content matching unit are corresponding
Detected rule determine whether the text to be detected meets the requirements.
Preferably, the paragraph determination unit includes:
It is total to search this for each co-occurrence word for the object phrase in the text to be detected for searching module
Existing word;
First determining module, when often finding the co-occurrence word for the searching module, according to the co-occurrence word to be detected
Position and default bout length in text determine a paragraph to be detected;
Second determining module, if for the object phrase without co-occurrence word, alternatively, the object phrase has co-occurrence word but institute
It states in text to be detected there is no when the co-occurrence word, entire text to be detected is determined as paragraph to be detected.
Preferably, first determining module includes:
First determines submodule, for respectively taking preset length before and after the co-occurrence word found in text to be detected
Text, obtain a paragraph to be detected;
Second determination submodule, in text to be detected, taking length to be equal to preset length and including this found
The paragraph of co-occurrence word is as paragraph to be detected.
Preferably, the content matching unit includes:
Term vector detection module, term vector and institute for phrase in paragraph to be detected described in utilization term vector model inspection
Whether the similarity for stating the term vector of object phrase reaches preset similarity value;Wherein, the term vector model is based on nerve net
The model for the synonym for identification that network training obtains;
Determining module, described in determining when the term vector detection module determines that similar value reaches preset similarity value
Contain the content to match with the object phrase in text to be detected.
Preferably, the co-occurrence word determination unit is specifically used for, and is obtained from preset co-occurrence vocabulary according to object phrase pair
The co-occurrence word answered, the preset co-occurrence vocabulary are to count to obtain based on the text collection with text same technique to be detected field
's.
Preferably, described device further include:
Entity detection unit is named, for calling and naming when needing to identify the corresponding name entity of the object phrase
Entity recognition model judges in the text to be detected with the presence or absence of name entity name corresponding with the object phrase;If
In the presence of then the text to be detected meets the requirements;If it does not exist, then the text to be detected is undesirable.
Preferably, the text acquiring unit is specifically used for, and will be converted to text to quality inspection voice, and obtain text to be detected
This.
On the other hand, the present invention provides a kind of computer readable storage medium, wherein the computer readable storage medium
On be stored with computer program, wherein the computer program realizes above-mentioned text when being executed by one or more computing devices
This detection method.
On the other hand, the present invention provides a kind of is including one or more computing devices and one or more storage devices
It unites, record has computer program on one or more of storage devices, and the computer program is one or more of
Computing device makes one or more of computing devices realize above-mentioned Method for text detection when executing.
By above-mentioned technical proposal, a kind of Method for text detection and device provided by the invention are examined to text to be detected
It is that paragraph to be detected is determined in text to be detected using the co-occurrence word of object phrase when surveying it and whether containing object phrase, with
This determines the paragraph position that the word to match with object phrase is likely to occur, and realizes to the preliminary of the sentence in text to be detected
Filtering treats detection paragraph using machine learning model in paragraph to be detected and carries out match query one by one, judgement is wherein later
Whether containing the content to match with object phrase, and the language that detection can be directed to object phrase is carried out using machine learning model
Justice carries out more accurate judgement, to improve the detection accuracy to text to be detected.
The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention,
And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can
It is clearer and more comprehensible, the followings are specific embodiments of the present invention.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field
Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention
Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 shows a kind of flow chart of Method for text detection of proposition of the embodiment of the present invention;
Fig. 2 shows the flow charts for another Method for text detection that the embodiment of the present invention proposes;
Fig. 3 shows a kind of composition block diagram of text detection device of proposition of the embodiment of the present invention;
Fig. 4 shows the composition block diagram of another text detection device of proposition of the embodiment of the present invention.
Specific embodiment
The exemplary embodiment that the present invention will be described in more detail below with reference to accompanying drawings.Although showing the present invention in attached drawing
Exemplary embodiment, it being understood, however, that may be realized in various forms the present invention without should be by embodiments set forth here
It is limited.It is to be able to thoroughly understand the present invention on the contrary, providing these embodiments, and can be by the scope of the present invention
It is fully disclosed to those skilled in the art.
Whether the embodiment of the invention provides a kind of Method for text detection, contain and object phrase phase for detecting in text
Matched content, and determine whether text meets the requirements with judging result.This method specific steps are as shown in Figure 1, comprising:
101, text to be detected is obtained.
102, the co-occurrence word of object phrase is determined.
Wherein, co-occurrence word refers to often to arrange in pairs or groups with object phrase and use, or goes out jointly with object phrase in the text
Existing some words or phrase.The quantity of determined co-occurrence word is not limited in the embodiment of the present invention, specific determination process can be with
The co-occurrence word being manually entered by user, i.e., user specify object phrase co-occurrence word;It is also possible to analyze and unite by data
Meter obtains co-occurrence word corresponding with object phrase, to directly acquire co-occurrence word, generally, is analyzed by data and is obtained with statistics
Co-occurrence word be to be obtained by carrying out participle statistics to a large amount of texts, for example, statistics is in object phrase is adjacent or nearby sphere
The word that frequency of occurrence is more than preset value is determined as the co-occurrence word of the object phrase by the word of interior appearance.
103, paragraph to be detected is determined in text to be detected using co-occurrence word.
Wherein, it is statement interlude in text to be detected, sentence that paragraph to be detected does not limit specifically in embodiments of the present invention
Or phrase of multiple words etc..
The purpose of this step is to mark or filter out the paragraph containing co-occurrence word in text to be detected, needs to illustrate
Be, due to co-occurrence word can be it is multiple, be also likely to be present multiple co-occurrence words in some paragraph to be detected.
104, it whether is detected in paragraph to be detected using machine learning model containing the content to match with object phrase.
In the embodiment of the present invention, which refers to is obtained using neural metwork training, can be identified and mesh
The model of phrase or vocabulary that the semanteme of mark phrase matches.In practical applications, the input of the machine learning model is to be checked
Survey paragraph and object phrase, export as matching result, i.e., paragraph to be detected match or is mismatched with object phrase, match illustrate to
There is word identical or approximate with the semanteme of object phrase in detection paragraph, mismatch then illustrates all in text to be detected
Word identical or approximate with the semanteme of object phrase is not present in paragraph to be detected.
105, determine whether text to be detected meets the requirements according to matching result and the corresponding detected rule of object phrase.
Specifically, this step is used to classify to a large amount of text to be detected, i.e., satisfactory Text Feature Extraction is gone out
Come.And in practical applications, when determining whether a text to be detected meets the requirements, judged for an object phrase
When, detected rule can also be specifically divided into when matching result is matches, which is to meet the requirements, either,
When matching result is to mismatch, which is to meet the requirements.And under normal circumstances, multiple object phrases can be set
Text to be detected is detected, at this time, it is also desirable to determine whether text to be detected conforms to according to preset judgment rule
It asks, for example, determining that the text meets the requirements when text to be detected and any one object phrase successful match;Either,
When text to be detected and all object phrase all successful match, just determine that the text meets the requirements.
It can be seen that the embodiment of the present invention when detecting text by above-mentioned embodiment shown in FIG. 1, mainly utilize mesh
Co-occurrence word corresponding to mark phrase first carries out the positioning that check bit is set to text, and then using machine learning model to positioning
Paragraph to be detected afterwards carries out semantic matches, to judge wherein with the presence or absence of the word to match with object phrase, finally, according to
The detected rule of object phrase determines whether the text meets the requirements.The co-occurrence word as used in the program and engineering
It practises model and the intellectualized algorithms such as big data statistics and neural network is utilized, so that when being detected to text, Neng Gougeng
Whether the semanteme to accurately judge that out text is identical as the semanteme of object phrase, and then improves the accuracy of text detection.
Further, in another embodiment of the invention, text detection method is applied to the neck to speech detection
When in domain, text to be detected therein be will after quality inspection voice is converted to text, obtain to quality inspection text, the field is main
It is the industry based on voice service, such as artificial customer service in phone, network electric business can all generate a large amount of voice number daily
According to, and in order to be objectively evaluated to the service level of artificial customer service, it is necessary to these voice data are detected, judge it
In whether there is the significant speech that can be used for evaluating, i.e. object phrase.Service based on object phrase to contact staff again
Cheng Jinhang evaluation.Based on application scenarios above-mentioned, the embodiment of the present invention passes through Method for text detection shown in Fig. 2 specifically in plain text
The detailed process of this detection:
201, text is converted to quality inspection voice, obtains text to be detected.
Wherein, without limitation, any one existing speech recognition program can be used in the concrete mode that voice turns text
Text is converted voice data into, text to be detected is generated.
202, corresponding co-occurrence word is obtained from preset co-occurrence vocabulary according to object phrase.
Specifically, object phrase is generally to judge the whether satisfactory actual conditions of text as set by user,
For example, can be judged by object phrase customer service or consultant term whether profession, standard etc..Generally, target is short
The setting quantity of language can be one or more, and when there is multiple object phrases, it will be one by one the acquisition pair of each object phrase
The co-occurrence word answered forms co-occurrence set of words.
Preset co-occurrence vocabulary in the present embodiment be based on the text collection in text same technique to be detected field statistics
It obtains.And the text collection in same technique field then can be the text for concentrating public data according to preset technical field
Classify, technical field belonging to text to be detected is selected as user, so that it is determined that the text collection in same technique field, benefit
Participle statistics is carried out with the text in text set, creates the corresponding preset co-occurrence vocabulary of the technical field, it is preset total at this
Record has a participle in text set there are co-occurrence word in existing vocabulary, and the co-occurrence times or frequency of the co-occurrence word and participle should be big
In preset value.
It should be noted that the preset co-occurrence vocabulary is changed with the variation of the content of text in text collection,
Because with the addition of new text, the update of content, the co-occurrence word respectively segmented in preset co-occurrence vocabulary can also become therewith
Change.Therefore, the preset co-occurrence vocabulary in this step is the tables of data with dynamic updating ability, is carried out when getting object phrase
When text detection, the object phrase is searched from preset co-occurrence vocabulary in real time, and if it exists, corresponding co-occurrence word is then obtained, if not
In the presence of then prompting the object phrase without co-occurrence word.
203, paragraph to be detected is determined in text to be detected using co-occurrence word.
Specifically, since there may be co-occurrence words in preset co-occurrence vocabulary for object phrase, it is also possible to co-occurrence word is not present,
Therefore, this step will determine respectively paragraph to be detected for both of these case.
When firstly, for co-occurrence word corresponding there are object phrase, these co-occurrence words will be used seriatim in text to be detected
This is searched, as soon as a co-occurrence word is often found, according to position of the co-occurrence word in text to be detected and default paragraph
Length determines a paragraph to be detected.Wherein, presetting bout length is usually to be measured with character quantity, and think at this
Contained word or phrase may have the phrase of approximate semanteme containing object phrase or with object phrase in default bout length.
In embodiments of the present invention, the feasible pattern for determining a paragraph to be detected according to default bout length includes:
First, in text to be detected, the text of preset length is respectively taken before and after the co-occurrence word found, obtains one
Paragraph to be detected.
Second, in text to be detected, length is taken to be equal to preset length and include the paragraph work of the co-occurrence word found
For paragraph to be detected.
Above-mentioned two ways difference is that position of the co-occurrence word in paragraph to be detected, the position of the first are relatively solid
It is fixed, and second be it is unfixed, that is, extract and meet that length is identical as preset length, and the paragraph item containing the co-occurrence word
The all possible paragraph interception way of part.For example, co-occurrence word is A, preset length is 3 letters, and is contained in text to be detected
A paragraph for having A is BCADF, then, if it is 1 letter, the first side that the first front and back, which respectively takes preset length all,
The paragraph to be detected that formula obtains is CAD, and the paragraph to be detected that the second way obtains is tri- sections of BCA, CAD, ADF.
Secondly, the case where for co-occurrence word corresponding there is no object phrase.At this point, being then that entire text to be detected is true
It is set to paragraph to be detected.
In addition, for there are the corresponding co-occurrence words of object phrase, but do not find the co-occurrence word in text to be detected
The case where, at this point, and entire text to be detected is determined as paragraph to be detected.
204, it whether is detected in paragraph to be detected using machine learning model containing the content to match with object phrase.
Specific in the present embodiment, which uses term vector model, which is based on nerve
The model for the synonym for identification that network training obtains.In the training term vector model, need to utilize a large amount of corpus number
According to being segmented, and neural network model is constructed, training obtains term vector, in actual use, by point in paragraph to be detected
Word or phrase and object phrase input the term vector model, and term vector model exports the term vector of each participle, and and object phrase
Term vector be compared, i.e., calculating term vector between similarity whether reach preset similarity value, if reaching, it is determined that
Containing the content to match with object phrase in detection text, the matching result of term vector model output at this time is successful match;And
If all paragraphs to be detected are when it fails to match with the object phrase in text to be detected, it is to be checked for exporting matching result
It surveys text and does not contain object phrase.
In one embodiment of the invention, the content to match with the phrase in quality inspection entry refer to synonymous phrase (
Referred to as close adopted phrase).For example, the phrase in quality inspection entry is " thanks ", quality inspection rule is to require the phrase that must occur, then exists
" thanks " and its synonymous phrase are searched in paragraph to be detected, such as " thanks ".
205, determine whether text to be detected meets the requirements according to matching result and the corresponding detected rule of object phrase.
This step is identical as the step 105 in Fig. 1, and details are not described herein again.
206, entity is named to text to be detected using Named Entity Extraction Model to detect.
Wherein, the detection of name entity is the further detection carried out for particular content corresponding to object phrase, for example,
When object phrase is " identification card number ", corresponding name entity is specific 18 bit digital or letter;When object phrase is
When " posting address ", corresponding name entity is the information such as specific address information, including area, street, number.
This is because in the insurance scenes such as telemarketing, need telemarketing personnel with client confirm insurer, by
Whens beneficial people and address etc., it is necessary to client be allowed to say actual name and address.By phone vehicle insurance marketing for, vehicle insurance sale at
After function, when business personnel needs and insurer confirms the information such as name address, if only simply going to judge whether with " address "
Have and confirm these information, in practice with it is possible that business personnel only says and " do not change address with insurer's confirmation
? ", confirmation address is not gone to practically, and these open information can not go to match with a sentence, therefore feasible
Whether method is comprising effective address after identifying phrase, this just needs to be judged using Named Entity Extraction Model.
Specifically, when executing this step, need before detection in setting detection configuration file setting specifically need into
The contents such as the object phrase and the corresponding identification format for naming entity information of row name entity detection.When actually executing,
Named Entity Extraction Model will be called to judge to whether there is name entity name corresponding with the object phrase in text to be detected,
Such as name, place name, mechanism name, and if it exists, then determine that the text to be detected meets the requirements;It is on the contrary, it is determined that text to be detected
It is undesirable.
It should be noted that Named Entity Extraction Model is similar with above-mentioned term vector model, it is all to utilize neural network
Content in the obtained text for identification of training whether include legal corresponding name entity model.Its training method is to adopt
With existing neural network model using there is the sample of mark to be trained, the present embodiment does not limit for concrete mode.
Further, this step is in a possible embodiments of the invention, it may also be used for is determined for compliance in step 205
It is required that text to be detected carry out further name entity detection, when having the corresponding name of object phrase in text to be detected
When entity name, determines that the text to be detected meets the requirements, otherwise determine that its is undesirable.
More than, text detection device provided in an embodiment of the present invention is described in detail by Fig. 2 and is being applied to voice data
Detection when can equally play and improve to the accuracy of voice data quality testing, for the quality inspection entry (mesh of user setting
Mark phrase), can accurately identify in voice data whether contain mesh by the cooperation detection of co-occurrence word and term vector model
Mark phrase or with the semantic approximate content of the object phrase, also, the present invention also further utilizes Named Entity Extraction Model pair
Content in text to be detected is detected, to realize in a certain fixed range after detected object phrase with the presence or absence of corresponding
Legitimate name entity, for voice data, so that it may judge in voice data with the presence or absence of necessary name entity number
According to detect art plasticity if contact staff or consultant.
Further, as the realization to above-mentioned Method for text detection, the embodiment of the invention provides a kind of text detections
Whether device, the device contain the content to match with object phrase mainly for detection of in text, and with judging result to text
This is classified.To be easy to read, present apparatus embodiment no longer goes to live in the household of one's in-laws on getting married one by one to the detail content in preceding method embodiment
It states, it should be understood that the device in the present embodiment can correspond to the full content realized in preceding method embodiment.The device is such as
Shown in Fig. 3, specifically include:
Text acquiring unit 31, for obtaining text to be detected;
Co-occurrence word determination unit 32, for determining the co-occurrence word of object phrase;
Paragraph determination unit 33, the co-occurrence word for being obtained using the co-occurrence word determination unit 32 are obtained in the text
Paragraph to be detected is determined in the text to be detected that unit 31 obtains;
Content matching unit 34, for detecting the to be detected of the determination of paragraph determination unit 33 using machine learning model
Whether contain the content to match with the object phrase in paragraph;
Text detection unit 35, matching result and the object phrase for being obtained according to the content matching unit 34
Corresponding detected rule determines whether the text to be detected meets the requirements.
Further, as shown in figure 4, the paragraph determination unit 33 includes:
Searching module 331, for each co-occurrence word for the object phrase, searching in the text to be detected should
Co-occurrence word;
First determining module 332 exists when often finding the co-occurrence word for the searching module 331 according to the co-occurrence word
Position and default bout length in text to be detected determine a paragraph to be detected;
Second determining module 333, if for the object phrase without co-occurrence word, alternatively, the object phrase has co-occurrence word
But when the searching module 331 does not find the co-occurrence word in text to be detected, by entire text to be detected be determined as to
Detection paragraph.
Further, as shown in figure 4, first determining module 332 includes:
First determines submodule 3321, for respectively taking before and after the co-occurrence word found default in text to be detected
The text of length obtains a paragraph to be detected;
Second determines submodule 3322, in text to be detected, taking length to be equal to preset length and including to find
The co-occurrence word paragraph as paragraph to be detected.
Further, as shown in figure 4, the content matching unit 34 includes:
Term vector detection module 341, for the term vector using phrase in paragraph to be detected described in term vector model inspection
Whether reach preset similarity value with the similarity of the term vector of the object phrase;Wherein, the term vector model is based on mind
The model of the synonym for identification obtained through network training;
Determining module 342, for when the term vector detection module 341 determine similar value reach preset similarity value when, really
Contain the content to match with the object phrase in the fixed text to be detected.
Further, the co-occurrence word determination unit 32 is specifically used for, and is obtained from preset co-occurrence vocabulary according to object phrase
Corresponding co-occurrence word is taken, the preset co-occurrence vocabulary is counted based on the text collection with text same technique to be detected field
It arrives.
Further, as shown in figure 4, described device further include:
Entity detection unit 36 is named, for calling and ordering when needing to identify the corresponding name entity of the object phrase
Name entity recognition model judges in the text to be detected with the presence or absence of name entity name corresponding with the object phrase;Such as
Fruit exists, then the text detection unit 35 determines that text to be detected meets the requirements;If it does not exist, then the text detection list
Member 35 determines that text to be detected is undesirable.
Further, the text acquiring unit 31 is specifically used for, and will be converted to text to quality inspection voice, and obtain to be detected
Text.
Based on the above-mentioned Method for text detection and device detected for object phrase, the embodiment of the present invention can also be into
One step includes: whether to contain the sentence to match with target sentences in detection text to be detected, and determine text with matching result
Whether meet the requirements, i.e., method target sentences detected.Wherein, sentence is usually and is made of multiple phrases, and right
Target sentences are detected, and can more effectively be detected for the semanteme of sentence.This method specific steps include:
The first, text to be detected is obtained.
The second, the sentence set to be detected in text to be detected is determined.
Wherein, the sentence in sentence set to be detected is based on determined by the target sentences of user setting, specifically really
Determining mode can be the length based on target sentences, be screened in text to be detected, find out what sentence length matched
Sentence is added in sentence set to be detected;It is also possible to be screened using the significant word in target sentences, will contains
The sentence of these words is determined as sentence to be detected and is added in sentence set to be detected.
The main purpose of this step is calculation amount required when reducing subsequent progress sentence matching, filters out text to be detected
In the visibly different sentence of semanteme with target sentences, to improve detection efficiency.Especially for some lengths it is longer to
Text is detected, if to carry out match cognization with target sentences sentence by sentence, it will a large amount of computing resource is consumed, when leading to matching
Between it is too long, influence user experience.
Third, the semanteme for detecting sentence and target sentences in sentence set to be detected one by one using Semantic Similarity model
Similitude.
Wherein, Semantic Similarity model is using neural network to largely there is the semantic to obtaining after being trained of mark
Data model can identify the Semantic Similarity between two sentences by the Semantic Similarity model.And there is the semanteme of mark
It, in the present embodiment can be with to the mark of this group of sentence to the one group of sentence referred to by being obtained after artificial or intelligent processing
Are as follows: " semantic similar " or " semantic dissimilar " is also possible to mark the semantic similarity value of two sentences.To being based in this step
The concrete mode of neural network algorithm training Semantic Similarity model is without limitation.
By the way that sentence to be detected and target sentences are inputted Semantic Similarity model, which will be corresponding defeated
A testing result out, the testing result can be " sentence is approximate " or " sentence is not approximate " judging result, be also possible to obtain
The calculated result of the similarity value of two sentences.
4th, judge according to testing result in text to be detected with the presence or absence of the similar sentence of semanteme with target sentences.
Specifically, text to be detected for one, when all sentences to be detected are all after the detection of step 103, such as
Fruit wherein exists and can then determine with sentence as the semantic similarity of target sentences and contain target sentences in the detection text, and its
In there is no determine in the detection text with sentence as the semantic similarity of target sentences without containing target sentences.Certainly, in reality
In the application of border, the specific judgement of this step can also set specific judgment rule, and the threshold value of sentence quantity is such as arranged, works as semanteme
When similar sentence quantity reaches the threshold value, it is determined that there are target sentences in the text to be detected.
5th, determine whether the text to be detected conforms to according to judging result and the corresponding detected rule of target sentences
It asks.
After whether containing the similar sentence of semanteme with target sentences in determining text to be detected, based on target sentences
Detected rule, such as text to be detected containing target sentences meet the requirements, or the text to be detected containing target sentences is not
It meets the requirements, classifies to text to be detected.
In practical applications, this step is then specifically used for classifying to a large amount of text to be detected, i.e., will meet the requirements
Or undesirable Text Feature Extraction comes out.And when determining whether a text to be detected meets the requirements, for a target
When sentence is judged, detected rule can also be specifically divided into when judging result is that sentence semantics are similar, the text to be detected
This is meets the requirements, and either, when matching result is to mismatch, which is to meet the requirements.And in ordinary circumstance
Under, user can be provided with multiple target sentences and detect to text to be detected, at this time, it is also desirable to according to preset judgment rule
Determine whether text to be detected meets the requirements, for example, when text to be detected and any one target sentences successful match, just
Determine that the text meets the requirements;Either, when text to be detected and all target sentences all successful match, the text is just determined
It meets the requirements.
It, mainly will be in text by the above embodiments as can be seen that the embodiment of the present invention is when carrying out text detection
Sentence filter out sentence to be detected, later using Semantic Similarity model one by one to the semanteme and target sentences of sentence to be detected
Semanteme detected, determine in text to be detected whether containing with the semantic similar sentence of target sentences, and then according to mesh
The detected rule of sentence is marked to judge whether the text to be detected meets the requirements, realizes and text classification is detected.In the detection
Cheng Zhong can examine text from semantic dimension since Semantic Similarity model is obtained based on neural metwork training
It surveys, the relatively existing participle based in target sentences carries out the mode of recognition detection, is more in line with the judgement of human thinking, makes
The accuracy for obtaining the Method for text detection of the embodiment of the present invention is higher.
In addition, in another embodiment of the invention, further text detection method can also be applied to voice
In the field of quality inspection, and text to be detected therein be then will after quality inspection voice is converted to text, obtain to quality inspection text,
The corresponding industry in the field is mainly the industry based on voice service, such as artificial customer service in phone, network electric business, at this
In a little industries, a large amount of voice data can be all generated daily, and in order to be objectively evaluated to the service level of artificial customer service, just
It needs to detect these voice data, whether judgement wherein has the significant sentence that can be used for evaluating, i.e. target sentences.
It is evaluated again based on service process of the target sentences to contact staff.And above-mentioned text detection process shown in FIG. 1 can
The voice to contact staff is realized according to evaluation criterion, i.e. target sentences, the purpose evaluated.Under the application scenarios, need
Convert speech into text, and the concrete mode that qualifier sound does not turn text in the present embodiment can be used it is existing any one
Kind speech recognition program converts voice data into text, generates text to be detected.
Further, it how can be seen that by the explanation to the above-mentioned embodiment for object statement detection to be checked
It surveys and determines sentence to be detected in text, final detection to the text to be detected will be directly influenced by obtaining sentence set to be detected
As a result, therefore, will be explained in detail second step below: the specific implementation of " determining the sentence set to be detected in text to be detected "
Scheme:
The first scheme, word for word traverses text to be detected, and the paragraph consistent in length of intercepted length and target sentences is put into
In sentence set to be detected.
Which is relatively simple in realization, intercepts according only to the length of target sentences to text to be detected, however,
Obtained sentence to be detected may not be the complete words in text, and sentence meaning is also likely to be present ambiguity, therefore, use
The sentence to be detected that this mode determines can have ambiguity semantically, and leading to testing result, there are certain deviations.Also, pass through
The quantity for the sentence that this mode obtains also can be relatively more, are unfavorable for improving the whole efficiency of text detection.But pass through the party
The sentence that formula obtains just has more comprehensive spreadability, to a certain degree due to its substantial amounts for text relatively to be detected
Upper is also that can promote text detection accuracy.
Second scheme segments target sentences, one or more target words is obtained after removing stop words, by one
Or multiple target word composition positioning set of words, for each positioning word in positioning set of words, searching from text to be detected should
Word is positioned, if found, interception includes the positioning word from text to be detected and length is consistent with the length of target sentences
Paragraph as a sentence to be detected.
Which is the optimization processing mode in the embodiment of the present invention.
In this approach, the purpose for removing stop words is more accurately to position word in order to obtain, may in target sentences
There are the indefinite words of some positioning functions, and such as " you ", " I " etc. positions set of words for reducing after the removal of these words
In word quantity, in this way, using positioning word searched in text to be detected when, a large amount of computing resource can also be saved,
Promote detection efficiency.
Which determine position word when, can also the synonym of further fetching portion target word (or be nearly justice
Word), the purpose that synonym is added is semanteme in order to extend target word in target sentences, to expand the screening of sentence to be detected
Range prevents from omitting semantic similar sentence.The synonym can use the acquisition of term vector model, which is base
In the machine learning model for the synonym for identification that neural metwork training obtains.The term vector model is by calculating input participle
Term vector, to identify whether the semanteme between different words similar, and then the synonym of target word can be calculated, and by this
Positioning set of words is added in a little synonyms together.In the present embodiment, the positioning word quantity in the positioning set of words can be limited
It is fixed, a quantity section is set, when positioning minimum value of the word lower than the section, just obtains synonym using term vector model,
To supplement the quantity of positioning word, and when the quantity for positioning word is higher than the maximum value in the section, then it is therein can to correspond to removal
The synonym of partial target word, be arranged such, it can be ensured that positioning set of words in have a certain number of positioning words, improve to
The probability that effective sentence to be detected is matched in detection text is not in match since positioning word quantity is very few less than to be checked
Survey sentence, result in the need for the detection of the full text of text to be detected, or due to positioning word quantity it is excessive and match excessively to
Sentence is detected, causes calculation amount to increase and reduce detection efficiency, it is seen then that can effectively improve matching by the way that the quantity section is arranged
The precision of sentence to be detected out, while ensure that the high efficiency of detection process.
In addition, in this approach, determining that another condition of sentence to be detected is from text to be detected according to positioning word
The bout length of interception is consistent with the length of target sentences, this is because in most cases two sentences to express it is identical
Semanteme, be in its word used often it is similar, and in quantity and roughly the same, it is therefore, semantic identical
Sentence, it is generally identical in sentence length, be based on the logic, in the embodiment of the present invention when determining sentence to be detected,
In addition to judging in sentence containing positioning word, it is also necessary to which the length of length of the judgement containing positioning words and phrases and target sentences should be big
It causes identical.Further, in the sentence intercepted, position Ying Yuqi position in target sentences of the word in sentence is positioned
It sets identical.
As an example it is assumed that target sentences ABC, wherein when searching in text to be detected to position word C, have found one
Section sentence ADBCE, then, this section of sentence can not be matched with target sentences, but be needed it when being detected
It is intercepted, according to above-mentioned interception rule, the sentence to be detected finally intercepted out is DBC.And when being searched with positioning word F, F
For the synonym of C, the one section of sentence RABFD found, the sentence to be detected finally intercepted out is ABF.
How details described above determines the sentence to be detected in text to be detected, generates sentence set to be detected, ties
Other steps for target sentences detection are closed, the embodiment of the present invention is described in detail when executing text detection, passes through selection
Part sentence to be detected and target sentences carry out the detection of Semantic Similarity, and according to testing result and the detection of target sentences is advised
Then finally determine whether text to be detected meets the requirements.
Further, as to the above-mentioned realization for carrying out Method for text detection for target sentences, the embodiment of the present invention is also
Provide a kind of text detection device, the device mainly for detection of in text whether containing matching with target sentences in
Hold, and is classified with judging result to text.To be easy to read, present apparatus embodiment is no longer in preceding method embodiment
Detail content is repeated one by one, is realized in preceding method embodiment it should be understood that the device in the present embodiment can correspond to
Full content.The device specifically includes:
Text acquiring unit, for obtaining text to be detected;
Sentence determination unit, for determining the sentence collection to be detected in text to be detected that the text acquiring unit obtains
It closes;
Similitude detection unit, for using Semantic Similarity model detect that the sentence determination unit determines one by one to
Detect the Semantic Similarity of the sentence and target sentences in sentence set;
Sentence detection unit, the testing result for being obtained according to the similitude detection unit judge the text to be detected
With the presence or absence of the similar sentence of semanteme with the target sentences in this;
Text detection unit, the judging result and the target sentences for being obtained according to the sentence detection unit are corresponding
Detected rule determine whether the text to be detected meets the requirements.
Further, described device further include:
Sample acquisition unit, for obtaining the sentence pair for largely having mark;Wherein, described to be labeled as semantic similar or language
It is adopted dissimilar;
Model training unit, the sentence pair for largely having mark for being obtained based on the sample acquisition unit, training institute
Predicate justice Similarity Model, so as to the good phonetic similarity model inspection sentence to be detected of 23 application training of similitude detection unit
The Semantic Similarity of sentence and target sentences in set.
Further, the sentence determination unit includes:
Sentence intercepts the first module, for word for word traversing the text to be detected, intercepted length and the target sentences
Paragraph consistent in length is put into sentence set to be detected.
Further, the sentence determination unit further include:
It positions word and obtains module, for segmenting to the target sentences, obtain one or more after removing stop words
Target word forms positioning set of words by one or more of target words;
Sentence intercepts the second module, for obtaining each positioning word in the positioning set of words that module obtains to positioning word,
The positioning word is searched from text to be detected, if found, interception includes the positioning word and length from text to be detected
Paragraph consistent in length with the target sentences is as a sentence to be detected.
Further, the positioning word obtains module and is also used to, and determines at least portion in one or more of target words
The synonym of partial objectives for word;Positioning set of words is formed by one or more of target words and the synonym.
Further, it includes the positioning word and length that the sentence, which intercepts the second module executing the interception from text to be detected,
The operation with the paragraph consistent in length of the target sentences is spent to specifically include:
The position of the positioning word in the paragraph intercepted is enabled, with the synonym of the positioning word or the positioning word in target sentence
Position consistency in son.
Further, the positioning word obtains module and determines in one or more of target words at least partly in execution
The operation of the synonym of target word specifically includes:
The synonym of at least partly target participle is obtained using term vector model, the term vector model is based on mind
The machine learning model of the synonym for identification obtained through network training.
Further, the text acquiring unit is specifically used for, and will be converted to text to quality inspection voice, and obtain text to be detected
This.
Further, it the embodiment of the invention also provides a kind of storage medium, is deposited on the computer readable storage medium
Contain computer program, wherein the computer program realizes above-mentioned text inspection when being executed by one or more computing devices
Survey method.
In addition, including one or more computing devices and one or more storage dresses the embodiment of the invention also provides one kind
The system set, record has computer program on one or more of storage devices, and the computer program is one
Or multiple computing devices make one or more of computing devices realize above-mentioned Method for text detection when executing.
In conclusion a kind of Method for text detection and device that the embodiment of the present invention proposes, be counted based on big data with
And the data model that neural network algorithm trains is detected to text to be detected or by the text that voice data is converted to,
Judge to realize and treat from semantic dimension whether containing the similar content of semanteme with the object phrase of user setting in its content
Detection text is detected, and the accuracy of text detection is improved.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment
Point, reference can be made to the related descriptions of other embodiments.
It is understood that the correlated characteristic in the above method and device can be referred to mutually.In addition, in above-described embodiment
" first ", " second " etc. be and not represent the superiority and inferiority of each embodiment for distinguishing each embodiment.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description,
The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
Algorithm and display are not inherently related to any particular computer, virtual system, or other device provided herein.
Various general-purpose systems can also be used together with teachings based herein.As described above, it constructs required by this kind of system
Structure be obvious.In addition, the present invention is also not directed to any particular programming language.It should be understood that can use various
Programming language realizes summary of the invention described herein, and the description done above to language-specific is to disclose this hair
Bright preferred forms.
In addition, memory may include the non-volatile memory in computer-readable medium, random access memory
(RAM) and/or the forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM), memory includes extremely
A few storage chip.
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application
Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more,
The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces
The form of product.
The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net
Network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/
Or the forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable Jie
The example of matter.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method
Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data.
The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves
State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable
Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM),
Digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic storage devices
Or any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, it calculates
Machine readable medium does not include temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability
It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap
Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want
Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including element
There is also other identical elements in process, method, commodity or equipment.
It will be understood by those skilled in the art that embodiments herein can provide as method, system or computer program product.
Therefore, complete hardware embodiment, complete software embodiment or embodiment combining software and hardware aspects can be used in the application
Form.It is deposited moreover, the application can be used to can be used in the computer that one or more wherein includes computer usable program code
The shape for the computer program product implemented on storage media (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.)
Formula.
The above is only embodiments herein, are not intended to limit this application.To those skilled in the art,
Various changes and changes are possible in this application.It is all within the spirit and principles of the present application made by any modification, equivalent replacement,
Improve etc., it should be included within the scope of the claims of this application.
Claims (10)
1. a kind of Method for text detection, wherein the described method includes:
Obtain text to be detected;
Determine the co-occurrence word of object phrase;
Paragraph to be detected is determined in the text to be detected using the co-occurrence word;
It whether is detected in the paragraph to be detected using machine learning model containing the content to match with the object phrase;
Determine whether the text to be detected meets the requirements according to matching result and the corresponding detected rule of the object phrase.
2. according to the method described in claim 1, wherein, using the co-occurrence word, determination is to be detected in the text to be detected
Paragraph includes:
For each co-occurrence word of the object phrase, the co-occurrence word is searched in the text to be detected;
The co-occurrence word is often found, determines one according to position of the co-occurrence word in text to be detected and default bout length
Paragraph to be detected;
If the object phrase is not present without co-occurrence word alternatively, the object phrase has in co-occurrence word but the text to be detected
Entire text to be detected is then determined as paragraph to be detected by the co-occurrence word.
3. according to the method described in claim 2, wherein, the position according to the co-occurrence word in text to be detected and pre-
If bout length determines a paragraph to be detected and includes:
In text to be detected, the text of preset length is respectively taken before and after the co-occurrence word found, obtains one to be detected section
It falls;
Alternatively,
In text to be detected, length is taken to be equal to preset length and include the paragraph of the co-occurrence word found as to be detected section
It falls.
4. according to the method described in claim 1, wherein, the utilization machine learning model detects the text fragment to be detected
In whether containing the content that matches with the object phrase include:
Utilize the phase of the term vector and the term vector of the object phrase of phrase in paragraph to be detected described in term vector model inspection
Whether reach preset similarity value like degree;Wherein, the term vector model be obtained based on neural metwork training it is same for identification
The model of adopted word;
If reaching, it is determined that contain the content to match with the object phrase in the text to be detected.
5. according to the method described in claim 1, wherein, the co-occurrence word of the determining object phrase includes:
Obtain corresponding co-occurrence word from preset co-occurrence vocabulary according to object phrase, the preset co-occurrence vocabulary be based on it is to be checked
What the text collection in survey text same technique field counted.
6. being used according to right and requiring method described in any one of 1-5, wherein the method also includes:
When needing to identify the corresponding name entity of the object phrase, Named Entity Extraction Model is called to judge described to be detected
It whether there is name entity name corresponding with the object phrase in text;
If it is present the text to be detected meets the requirements;
If it does not exist, then the text to be detected is undesirable.
7. according to right with requiring method described in any one of 1-5, wherein described to obtain text to be detected and include:
Text will be converted to quality inspection voice, and obtain text to be detected.
8. a kind of text detection device, wherein described device includes:
Text acquiring unit, for obtaining text to be detected;
Co-occurrence word determination unit, for determining the co-occurrence word of object phrase;
Paragraph determination unit, the co-occurrence word for being obtained using the co-occurrence word determination unit are obtained in the text acquiring unit
Text to be detected in determine paragraph to be detected;
Content matching unit is for being detected in the paragraph to be detected that the paragraph determination unit determines using machine learning model
It is no to contain the content to match with the object phrase;
Text detection unit, matching result and the corresponding inspection of the object phrase for being obtained according to the content matching unit
Gauge then determines whether the text to be detected meets the requirements.
9. a kind of computer readable storage medium, wherein it is stored with computer program on the computer readable storage medium,
In, side described in any one of claim 1-7 is realized when the computer program is executed by one or more computing devices
Method.
10. a kind of system including one or more computing devices and one or more storage devices, one or more of to deposit
Record has computer program on storage device, and the computer program makes institute when being executed by one or more of computing devices
It states one or more computing devices and realizes such as method of any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811247368.2A CN109271489B (en) | 2018-10-25 | 2018-10-25 | Text detection method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811247368.2A CN109271489B (en) | 2018-10-25 | 2018-10-25 | Text detection method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109271489A true CN109271489A (en) | 2019-01-25 |
CN109271489B CN109271489B (en) | 2020-12-15 |
Family
ID=65194051
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811247368.2A Active CN109271489B (en) | 2018-10-25 | 2018-10-25 | Text detection method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109271489B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110334241A (en) * | 2019-07-10 | 2019-10-15 | 深圳前海微众银行股份有限公司 | Quality detecting method, device, equipment and the computer readable storage medium of customer service recording |
CN110442872A (en) * | 2019-08-06 | 2019-11-12 | 中科鼎富(北京)科技发展有限公司 | A kind of text elements integrality checking method and device |
CN110956956A (en) * | 2019-12-13 | 2020-04-03 | 集奥聚合(北京)人工智能科技有限公司 | Voice recognition method and device based on policy rules |
CN112069796A (en) * | 2020-09-03 | 2020-12-11 | 阳光保险集团股份有限公司 | Voice quality inspection method and device, electronic equipment and storage medium |
WO2020258904A1 (en) * | 2019-06-28 | 2020-12-30 | 深圳追一科技有限公司 | Method for determining quality check effectiveness, device, apparatus, and storage medium |
WO2021047003A1 (en) * | 2019-09-09 | 2021-03-18 | 深圳前海微众银行股份有限公司 | Text positioning method, apparatus, device, and storage medium |
CN113094469A (en) * | 2021-04-02 | 2021-07-09 | 清华大学 | Text data analysis method and device, electronic equipment and storage medium |
CN116308221A (en) * | 2023-05-25 | 2023-06-23 | 成都信通信息技术有限公司 | Automatic low-carbon medal generating system based on artificial intelligence |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150106080A1 (en) * | 2013-10-10 | 2015-04-16 | Fuji Xerox Co., Ltd. | Information processing apparatus, information processing method, and non-transitory computer readable medium |
CN106445998A (en) * | 2016-05-26 | 2017-02-22 | 达而观信息科技(上海)有限公司 | Text content auditing method and system based on sensitive word |
CN106910501A (en) * | 2017-02-27 | 2017-06-30 | 腾讯科技(深圳)有限公司 | Text entities extracting method and device |
CN107301169A (en) * | 2017-06-16 | 2017-10-27 | 科大讯飞股份有限公司 | Digress from the subject composition detection method, device and terminal device |
CN107544956A (en) * | 2016-06-24 | 2018-01-05 | 科大讯飞股份有限公司 | A kind of text wants point detecting method and system |
-
2018
- 2018-10-25 CN CN201811247368.2A patent/CN109271489B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150106080A1 (en) * | 2013-10-10 | 2015-04-16 | Fuji Xerox Co., Ltd. | Information processing apparatus, information processing method, and non-transitory computer readable medium |
CN106445998A (en) * | 2016-05-26 | 2017-02-22 | 达而观信息科技(上海)有限公司 | Text content auditing method and system based on sensitive word |
CN107544956A (en) * | 2016-06-24 | 2018-01-05 | 科大讯飞股份有限公司 | A kind of text wants point detecting method and system |
CN106910501A (en) * | 2017-02-27 | 2017-06-30 | 腾讯科技(深圳)有限公司 | Text entities extracting method and device |
CN107301169A (en) * | 2017-06-16 | 2017-10-27 | 科大讯飞股份有限公司 | Digress from the subject composition detection method, device and terminal device |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020258904A1 (en) * | 2019-06-28 | 2020-12-30 | 深圳追一科技有限公司 | Method for determining quality check effectiveness, device, apparatus, and storage medium |
CN110334241A (en) * | 2019-07-10 | 2019-10-15 | 深圳前海微众银行股份有限公司 | Quality detecting method, device, equipment and the computer readable storage medium of customer service recording |
CN110334241B (en) * | 2019-07-10 | 2023-08-25 | 深圳前海微众银行股份有限公司 | Quality inspection method, device and equipment for customer service record and computer readable storage medium |
CN110442872A (en) * | 2019-08-06 | 2019-11-12 | 中科鼎富(北京)科技发展有限公司 | A kind of text elements integrality checking method and device |
WO2021047003A1 (en) * | 2019-09-09 | 2021-03-18 | 深圳前海微众银行股份有限公司 | Text positioning method, apparatus, device, and storage medium |
CN110956956A (en) * | 2019-12-13 | 2020-04-03 | 集奥聚合(北京)人工智能科技有限公司 | Voice recognition method and device based on policy rules |
CN112069796A (en) * | 2020-09-03 | 2020-12-11 | 阳光保险集团股份有限公司 | Voice quality inspection method and device, electronic equipment and storage medium |
CN112069796B (en) * | 2020-09-03 | 2023-08-04 | 阳光保险集团股份有限公司 | Voice quality inspection method and device, electronic equipment and storage medium |
CN113094469A (en) * | 2021-04-02 | 2021-07-09 | 清华大学 | Text data analysis method and device, electronic equipment and storage medium |
CN113094469B (en) * | 2021-04-02 | 2022-07-05 | 清华大学 | Text data analysis method and device, electronic equipment and storage medium |
CN116308221A (en) * | 2023-05-25 | 2023-06-23 | 成都信通信息技术有限公司 | Automatic low-carbon medal generating system based on artificial intelligence |
CN116308221B (en) * | 2023-05-25 | 2023-07-21 | 成都信通信息技术有限公司 | Automatic low-carbon medal generating system based on artificial intelligence |
Also Published As
Publication number | Publication date |
---|---|
CN109271489B (en) | 2020-12-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109446524B (en) | A kind of voice quality detecting method and device | |
CN109460455A (en) | A kind of Method for text detection and device | |
CN109271489A (en) | A kind of Method for text detection and device | |
CN107798136B (en) | Entity relation extraction method and device based on deep learning and server | |
US9542477B2 (en) | Method of automated discovery of topics relatedness | |
US20160239500A1 (en) | System and methods for extracting facts from unstructured text | |
US20220398857A1 (en) | Document analysis architecture | |
RU2491622C1 (en) | Method of classifying documents by categories | |
KR102379674B1 (en) | Method and Apparatus for Analyzing Tables in Document | |
CN110188357B (en) | Industry identification method and device for objects | |
CN110019660A (en) | A kind of Similar Text detection method and device | |
CN108228612B (en) | Method and device for extracting network event keywords and emotional tendency | |
CN110955766A (en) | Method and system for automatically expanding intelligent customer service standard problem pairs | |
US11755766B2 (en) | Systems and methods for detecting personally identifiable information | |
CN112132238A (en) | Method, device, equipment and readable medium for identifying private data | |
CN106598997B (en) | Method and device for calculating text theme attribution degree | |
CN114117038A (en) | Document classification method, device and system and electronic equipment | |
CN111916110A (en) | Voice quality inspection method and device | |
CN113723085B (en) | Pseudo-fuzzy detection method in privacy policy document | |
CN111401047A (en) | Method and device for generating dispute focus of legal document and computer equipment | |
US11776291B1 (en) | Document analysis architecture | |
CN113139379B (en) | Information identification method and system | |
US20210216721A1 (en) | System and method to quantify subject-specific sentiment | |
CN114417860A (en) | Information detection method, device and equipment | |
CN114443834A (en) | Method and device for extracting license information and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |