CN111611340A - Information extraction method and device, computer equipment and storage medium - Google Patents

Information extraction method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN111611340A
CN111611340A CN201910143416.1A CN201910143416A CN111611340A CN 111611340 A CN111611340 A CN 111611340A CN 201910143416 A CN201910143416 A CN 201910143416A CN 111611340 A CN111611340 A CN 111611340A
Authority
CN
China
Prior art keywords
crime
name
criminal
candidate
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910143416.1A
Other languages
Chinese (zh)
Inventor
李存林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Huiruisitong Information Technology Co Ltd
Original Assignee
Guangzhou Huiruisitong Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Huiruisitong Information Technology Co Ltd filed Critical Guangzhou Huiruisitong Information Technology Co Ltd
Priority to CN201910143416.1A priority Critical patent/CN111611340A/en
Publication of CN111611340A publication Critical patent/CN111611340A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services

Landscapes

  • Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Engineering & Computer Science (AREA)
  • Marketing (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Technology Law (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Machine Translation (AREA)

Abstract

The application relates to an information extraction method, an information extraction device, computer equipment and a storage medium, wherein the method comprises the following steps: the method comprises the steps of obtaining a crime fact text, extracting candidate keywords corresponding to a crime name requirement of the crime fact text, obtaining position information of the candidate keywords corresponding to the crime name requirement, recombining the candidate keywords according to the position information of the candidate keywords to obtain a recombined speech segment, extracting target keywords corresponding to the crime name requirement in the recombined speech segment, and using the target keywords to pre-judge a crime name corresponding to the crime fact text. The criminal condition text is subjected to information extraction to obtain a criminal condition for criminal name judgment, candidate keywords corresponding to the criminal condition are recombined, the recombined criminal condition is extracted, information redundancy is reduced, the candidate keywords which are not logically established are deleted, and target keywords with more strict logics are obtained, so that the extracted information is more accurate, and the criminal condition of a main body can be better judged by adopting accurate information.

Description

Information extraction method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to an information extraction method and apparatus, a computer device, and a storage medium.
Background
In recent years, as artificial intelligence technology has matured, many fields, such as natural language processing, speech recognition, spam filtering, machine translation, advertisement recommendation, etc., have been developed rapidly by using artificial intelligence technology. In the aspect of law, the combination with artificial intelligence is less, and most legal work is mainly performed by manpower.
Laws belong to the field of strong professionalism, and non-professionals often cannot obtain expected legal knowledge by inquiring data, and professional legal consultation is time-consuming and money-consuming. On the other hand, legal knowledge and case knowledge owned by legal workers per se are limited, and in some special cases, assistance needs to be acquired by means of other channels, and at present, no shortcut can help everyone to accurately judge the criminal name of each case.
Disclosure of Invention
In order to solve the technical problem, the application provides an information extraction method, an information extraction device, a computer device and a storage medium.
In a first aspect, the present application provides an information extraction method, including:
acquiring a crime fact text;
extracting candidate keywords corresponding to the crime name key of the crime fact text;
acquiring position information of candidate keywords corresponding to the criminal name requirement, and recombining the candidate keywords according to the position information of the candidate keywords to obtain a recombined speech segment;
and extracting target keywords corresponding to the crime name key elements in the recombined language segments, wherein the target keywords are used for prejudging the crime names corresponding to the crime fact texts.
In a second aspect, the present application provides an information extraction apparatus, comprising:
the text acquisition module is used for acquiring a crime fact text;
the candidate keyword extraction module is used for extracting candidate keywords corresponding to the crime name key of the crime fact text;
the keyword recombination module is used for acquiring position information of candidate keywords corresponding to the criminal name requirement, and recombining the candidate keywords according to the position information of the candidate keywords to obtain recombined speech segments;
and the target keyword extraction module is used for extracting target keywords corresponding to the crime name key in the recombined language segment, and the target keywords are used for pre-judging the crime names corresponding to the crime fact texts.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:
acquiring a crime fact text;
extracting candidate keywords corresponding to the crime name key of the crime fact text;
acquiring position information of candidate keywords corresponding to the criminal name requirement, and recombining the candidate keywords according to the position information of the candidate keywords to obtain a recombined speech segment;
and extracting target keywords corresponding to the crime name key elements in the recombined language segments, wherein the target keywords are used for prejudging the crime names corresponding to the crime fact texts.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring a crime fact text;
extracting candidate keywords corresponding to the crime name key of the crime fact text;
acquiring position information of candidate keywords corresponding to the criminal name requirement, and recombining the candidate keywords according to the position information of the candidate keywords to obtain a recombined speech segment;
and extracting target keywords corresponding to the crime name key elements in the recombined language segments, wherein the target keywords are used for prejudging the crime names corresponding to the crime fact texts.
The information extraction method, the information extraction device, the computer equipment and the storage medium comprise the following steps: the method comprises the steps of obtaining a crime fact text, extracting candidate keywords corresponding to a crime name requirement of the crime fact text, obtaining position information of the candidate keywords corresponding to the crime name requirement, recombining the candidate keywords according to the position information of the candidate keywords to obtain a recombined speech segment, extracting target keywords corresponding to the crime name requirement in the recombined speech segment, and using the target keywords to pre-judge a crime name corresponding to the crime fact text. The criminal condition text is subjected to information extraction to obtain a criminal condition for criminal name judgment, candidate keywords corresponding to the criminal condition are recombined, the recombined criminal condition is extracted, information redundancy can be reduced, the logically unrealized candidate keywords are deleted, and the logically stricter target keywords are obtained, so that the extracted information is more accurate, and the criminal condition of a main body can be judged better by adopting accurate information.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
FIG. 1 is a diagram of an exemplary environment in which a method for extracting information may be implemented;
FIG. 2 is a flow diagram illustrating a method for information extraction in one embodiment;
FIG. 3 is a block diagram showing the structure of an information extraction apparatus according to an embodiment;
FIG. 4 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
FIG. 1 is a diagram of an exemplary information extraction system. Referring to fig. 1, the information extraction method is applied to an information extraction system. The information extraction system includes a terminal 110 and a server 120. The terminal 110 and the server 120 are connected through a network. The terminal or the server obtains a crime fact text, extracts candidate keywords corresponding to a crime name requirement of the crime fact text, obtains position information of the candidate keywords corresponding to the crime name requirement, recombines the candidate keywords according to the position information of the candidate keywords to obtain a recombined speech segment, extracts target keywords corresponding to the crime name requirement in the recombined speech segment, and the target keywords are used for prejudging the crime name corresponding to the crime fact text. The terminal 110 may specifically be a desktop terminal or a mobile terminal, and the mobile terminal may specifically be at least one of a mobile phone, a tablet computer, a notebook computer, and the like. The server 120 may be implemented as a stand-alone server or a server cluster composed of a plurality of servers.
In one embodiment, as shown in FIG. 2, an information extraction method is provided. The embodiment is mainly illustrated by applying the method to the terminal 110 (or the server 120) in fig. 1. Referring to fig. 2, the information extraction method specifically includes the following steps:
in step S201, a crime fact text is acquired.
Step S202, candidate keywords corresponding to the crime name key of the crime fact text are extracted.
Step S203, obtaining the position information of the candidate keywords corresponding to the criminal name requirement, and recombining the candidate keywords according to the position information of the candidate keywords to obtain a recombined speech segment.
And S204, extracting target keywords corresponding to the crime name key elements in the recombined language segments, wherein the target keywords are used for prejudging the crime names corresponding to the crime fact texts.
In particular, since there are many types of laws involved, the present embodiment takes criminal law as an example for better explanation. The crime fact text is text information describing crime facts, and the information may be case information downloaded from the internet, text information recognized by voice information dictated by a criminal agent, and text information recognized on picture information. The criminal name condition refers to four major factors for judging whether an agent crimes or not, and comprises a subject, an object, a subjective aspect and an objective aspect, wherein the subject is a person capable of bearing criminal responsibility, the object refers to a social relationship which is infringed by criminal behaviors and protected by laws, the subjective aspect refers to a psychological state that the criminal behaviors conducted by the agent are intentionally or wrongly subjective, the objective aspect explains the social harmfulness of the behaviors, and the objective fact characteristics which are necessary for establishing the behaviors into crimes are provided. The candidate keywords corresponding to the criminal name requirement refer to attribute information for describing the criminal name requirement, for example, when the subject is a person, the attribute information may include words for describing the person, such as the person's title, sex, and age, and may be a unit, the attribute information may include a unit name, a unit establishment time, a unit responsible person, and the like, and words for describing the unit. The position information refers to the position information of each vocabulary in the crime text, and the position can be represented in a self-defined mode, for example, the crime fact text can be numbered, and the position of each vocabulary can be represented by a number.
After candidate keywords corresponding to the crime name key elements of the crime fact text are extracted and position information of each candidate keyword is obtained, each candidate keyword is recombined according to the front and back sequence of the position to form a recombined speech segment. Because sentences which can not be inferred according to normal logic exist in the recombined speech segments, after the speech segments are recombined, the recombined speech segments need to be screened to obtain target speech segments, keywords of the crime key elements in the target speech segments are extracted, and the keywords of the crime key elements extracted from the target speech segments form target keywords corresponding to the crime key elements. The extracted target keywords of the guilt name requirement can be used for judging the behavior of the main body and judging whether the main body can make a crime or not. The method comprises the steps of determining candidate keywords of a subject, an object, a subjective aspect and an objective aspect by extracting keywords of a crime key element in a scope fact text, recombining the candidate keywords according to the position of the candidate keywords to obtain a recombined speech segment, re-extracting the crime key element of the recombined speech segment, judging the behavior of the subject according to the re-extracted crime key element, extracting candidate key information by extracting the first keywords to reduce information redundancy, and further reducing the information redundancy in the extracting step by recombining, wherein the extracted crime key element is key information for crime judgment, and can be helpful for judging the crime.
In an embodiment, the method for extracting candidate keywords corresponding to the crime name requirement of the crime fact text may be customized, for example, candidate keywords of the crime name requirement in the crime fact text may be extracted through a trained training model, candidate keywords of the crime name requirement may be extracted in a dictionary query manner, or a four-requirement of determining the crime fact text according to the part of speech of a vocabulary, or a crime name requirement may be determined in a manner of combining the training model, the dictionary query manner, the part of speech of the vocabulary, and the like.
In one embodiment, extracting candidate keywords corresponding to the crime name element of the crime fact text comprises: the criminal fact text is input into a deep learning network model, the criminal fact text is segmented through the deep learning network model to obtain a plurality of segmentation results, and each segmentation result is screened to obtain candidate keywords corresponding to the criminal name requirement.
Specifically, the deep learning network model is a mathematical model obtained by learning a large number of label texts carrying the criminal name requirements, and the mathematical model can rapidly extract keywords corresponding to the criminal name requirements in the criminal fact texts. The mathematical model can be understood as comprising two models, wherein the first model is a word segmentation model for segmenting the crime fact text, the other model is a matching model for screening and matching the word segmentation result, the crime fact text is analyzed through the word segmentation model to obtain a plurality of word segmentation results, the word segmentation results are input into the matching model, and the crime name conditions corresponding to the word segmentation results are matched through the matching model, if the crime fact text is that the crime fact text attacks the fourth Li and the fifth Wang, the word segmentation results are that the third, the attack, the fourth, the sum and the fifth Wang. Wherein, Zhang three is the candidate keyword of the subject, attack is the candidate keyword of the objective aspect, Li four and Wang five are the candidate keyword of the object.
In one embodiment, when there are multiple subjects, multiple objects, or multiple objective aspects, the multiple subjects, multiple objects, or multiple objective aspects are combined to obtain at least one combination, the combination including at least the subject and the corresponding objective aspects.
Specifically, a plurality of subjects, a plurality of objects, a plurality of objective aspects, and the like may exist in the same crime fact text, for example, there are one or a plurality of crime facts that a plurality of subjects implement for the same object, there may also exist a case that one subject implements at least one crime fact for a plurality of objects, and there may also be a case that a plurality of subjects implement at least one crime fact for a plurality of objects, so when there are a plurality of subjects or a plurality of objects, after obtaining a target keyword of a crime name component, the crime name component needs to be combined to obtain at least one combination, where the combination at least includes the subject and the corresponding objective aspect. Combinations that cannot be composed of subject and objective aspects are removed, leaving at least a combination of subject and corresponding objective aspects. For example, the combination may be a subject and corresponding objective aspect, and also includes a subject, a corresponding object and corresponding objective aspect, and may also include a subject, an object, a subjective aspect and an objective aspect. When only a subject and an objective aspect are included, an object may be hidden, such as "zhang san venomous drug", "zhang san" is the subject, "zhang san" is the objective aspect, and the hidden object is "drug". The criminal name requirements are combined, so that the information has better logic, the extracted information is more accurate, and the criminal fact of the main body can be better judged.
In one embodiment, the person name and the person age are screened from the target keyword corresponding to the main body, whether the main body reaches the legal criminal appraising age is judged according to the person age corresponding to the person name, when the legal appraising age is met, the action date is extracted from the objective aspect, whether the action date is located before the effective date of the criminal name is judged, and when the action date is located in the effective date of the criminal name, the criminal name of the main body corresponding to the person name is judged to be established.
Specifically, after target keywords corresponding to the criminal requirements are extracted, the target keywords corresponding to each requirement in the criminal requirements are sorted, for example, the character name and the corresponding character age in the target keywords are extracted, whether the character age corresponding to the character name reaches the legal criminal appraising age is judged, when the character name meets the legal criminal appraising age, the objective aspect corresponding to the character name minister is judged, the action date in the objective aspect is extracted, whether the action date is before the criminal approval date is judged, if the action is before the criminal approval date, a main body corresponding to the character name can be judged to have a guilty, and otherwise, the fact that the character name is established is indicated. Whether crime can be judged is determined according to the age of people, only people meeting the legal age can judge the crime, and if the people do not meet the legal age, the crime name judged according to the previous information cannot be established. After the statutory criminal judging age is met, the effective time of the criminal name pre-judged according to the previous information is judged, and the criminal can be judged only by the action made during the effective period of the criminal name. Whether criminal can be judged is determined according to key information such as age of the main body, so that whether the criminal name is established or not can be judged more accurately, and whether the criminal name is established or not can be judged more accurately according to the date of taking effect of the criminal name.
The information extraction method comprises the following steps: the method comprises the steps of obtaining a crime fact text, extracting candidate keywords corresponding to a crime name requirement of the crime fact text, obtaining position information of the candidate keywords corresponding to the crime name requirement, recombining the candidate keywords according to the position information of the candidate keywords to obtain a recombined speech segment, extracting target keywords corresponding to the crime name requirement in the recombined speech segment, and using the target keywords to pre-judge a crime name corresponding to the crime fact text. The criminal condition text is subjected to information extraction to obtain a criminal condition for criminal name judgment, candidate keywords corresponding to the criminal condition are recombined, the recombined criminal condition is extracted, information redundancy can be reduced, the logically unrealized candidate keywords are deleted, and the logically stricter target keywords are obtained, so that the extracted information is more accurate, and the criminal condition of a main body can be judged better by adopting accurate information.
In one embodiment, extracting candidate keywords corresponding to the crime name element of the crime fact text further includes:
step S301, obtaining part-of-speech tagging information corresponding to each vocabulary of the crime fact text and corresponding position information.
Step S302, according to the part-of-speech tagging information, a first vocabulary with the part-of-speech tagging information as a noun and corresponding position information are screened out, and the first vocabulary is used as a candidate keyword corresponding to the object.
Step S303, according to the part-of-speech tagging information, a second vocabulary with part-of-speech tagging information as verbs and corresponding position information are screened out, and the second vocabulary is used as a candidate keyword in an objective aspect.
Specifically, the part-of-speech information corresponding to each vocabulary in the crime fact text may be labeled manually, labeled by an automatic part-of-speech labeling model, or labeled by a combination of automatic labeling and manual labeling, and the location information of each vocabulary is the location information considered to be defined. The method comprises the steps of obtaining part-of-speech tagging information corresponding to each word in a crime fact text, wherein the part-of-speech tagging information comprises a person-named pronoun, a time adverb, a verb, a noun, an adjective and the like, screening the words of which the part-of-speech tagging information is the noun to form a first word according to the part-of-speech tagging information of each word, taking the first word as a candidate keyword corresponding to an object, screening the words of which the part-of-speech tagging information is the verb to form a second word, and taking the second word as a candidate keyword in an objective aspect. The noun is selected as the candidate keyword of the subject because the probability that the object is used as the keyword is larger, and the verb is used as the candidate keyword of the objective aspect because the behavior action represented by the verb is a description belonging to the real situation and belongs to the objective fact.
In one embodiment, extracting candidate keywords for subjective aspects comprises: and acquiring a delinquent dictionary, screening words corresponding to the crime fact text according to the delinquent dictionary to obtain delinquent words, and taking the delinquent words as candidate keywords in the subjective aspect.
Specifically, the delinquent dictionary refers to a dictionary constructed of words for describing the behavior state of the subject, which is constructed in advance. Such as "wrong hands", "unintentional", "proper defense", and "carelessness", etc., are used to describe the psychological state of a subject that inadvertently perpetrates a criminal act. And filtering the crime fact text through the delinquent dictionary, screening out words matched with the words in the delinquent dictionary to form the delinquent words, and taking the delinquent words as candidate keywords in the subjective aspect. The constructed delinquent dictionary can quickly filter the vocabulary process in the crime fact text. Candidate keywords in the subjective aspect are enriched according to the constructed delinquent dictionary, so that words in the subjective aspect are more comprehensive, better judgment is made for judging the crime fact of the main body in the case, and the judgment accuracy is improved.
In one embodiment, the reformulation term segment includes a plurality of sub-segments, and extracting a target keyword corresponding to a crime name element in the reformulation term segment includes:
step S401, judging whether candidate keywords with parts of speech marked as verbs exist in each sub-paragraph in the restructured language segment.
Step S402, when a verb exists in the sub-paragraph, the name, the person name or the unit name before the verb in the sub-paragraph is taken as the target keyword of the main body.
Step S403, determining whether the objects located behind the verbs in the subsegments include a plurality of objects, and if so, determining whether the plurality of objects belong to a parallel relationship.
In step S404, when the object belongs to the parallel relationship, a plurality of objects are set as the target objects.
And step S405, when the object does not belong to the parallel relation, taking the object which is farthest away from the verb in the sub-paragraph as the target object, and taking the candidate keyword corresponding to the target object as the target keyword.
Specifically, the sub-segment refers to one of the short word segments in the recombined word segment, and each word segment is divided by a punctuation mark, for example, a sentence mark, an exclamation mark, or the like represents a punctuation mark of a complete sentence to divide the recombined word segment, so as to obtain a plurality of sub-segments. Obtaining part-of-speech tagging information of each candidate keyword, judging whether each sub-paragraph in the reorganization word segment has the candidate keyword of which the part-of-speech tagging information is a verb according to the part-of-speech tagging information of each candidate word, and considering that the sub-paragraph belongs to unimportant information when the verb does not exist in the sub-paragraph, it may be deleted that, when a verb is present, it indicates that there is an important description of the crime fact in the sub-paragraph, a word for indicating a person or a unit located before the verb in the sub-paragraph is taken as a candidate keyword of the subject, for example, the vocabulary used for representing the person or the unit includes the person's pronouns, names, or unit names, etc., it is determined whether the object located behind the verb in the sub-paragraph includes a plurality of objects, and when a plurality of objects are included, it is determined whether the relationship between the objects belongs to the parallel relationship, wherein, whether the objects belong to the parallel relation can be determined by parallel words and parallel punctuations. And when the relations among all the objects are parallel relations, taking all the objects as target objects, and when any two objects are not parallel relations, selecting the object which is farthest away from the action in the sub-section as the target object, and taking the candidate keywords corresponding to the target object as the target keywords. The subject and the object are divided by judging the verb, the subject is the vocabulary used for describing people and/or units before the verb, the object is the vocabulary used for describing the object after the verb, the relationship between the objects is judged, when a plurality of objects exist, whether all the objects are the target objects is judged, and the object is determined according to the judgment result, so that the extracted information is more accurate, redundant information is deleted, and better judgment is made for cases corresponding to the criminal fact text.
In a specific embodiment, the information extraction method includes:
and identifying names in the crime fact text, such as names of people, places, organizations, time, dates, numbers and diseases, by adopting a deep learning network model and combining preset rules. The locations where they appear in the crime fact text are identified and marked. The person name and the organization name are used as candidate keywords of the subject and the object, the place name can be used as the object, and can be directly omitted under general conditions, the date can be used for judging whether the time occurring in the objective aspect is earlier than the effective time of the crime name, and the ages of the subject and the object in the crime can be calculated according to the birth date. Wherein the date and age are mainly used to judge whether criminal liability is required.
The method comprises the steps of performing part-of-speech tagging on each word in a crime fact text, taking all words tagged as nouns as candidate keywords of a candidate object, marking positions, adding the words and the positions of the words into the candidate keywords of the object, taking all the words tagged as verbs as the candidate keywords of an objective aspect, marking the positions of the words in a sentence, adding the words and the positions of the words into the candidate keywords of the objective aspect, and taking dates as the additional attributes of the words.
A delinquent dictionary is obtained, wherein the words stored in the delinquent dictionary represent that the occurrence of the behavior is of a delinquent type, and the dictionary is used for defining subjective aspects.
The crime name requirements for judging whether the crime name is established, and the candidates of a subject, an object, a subjective aspect and an objective aspect are extracted, but the candidates are not necessarily really required information, and exist in a scattered form, so that further filtering, combination and confirmation are needed.
And (4) carrying out position recovery combination on all the extracted words and punctuations by depending on the position marks reserved in the previous three steps of extraction, and simultaneously reserving the combination of part-of-speech patterns.
Deleting the beginning punctuations and the continuous punctuations of the sentence, dividing the long sentence into short sentences by using punctuation marks, extracting triples consisting of a subject, an objective aspect and an object or duplets consisting of the subject and the objective aspect by depending on the short sentences, and independently judging the subjective aspect by using a dictionary.
Judging whether a verb exists in the short sentence according to the part of speech, if so, keeping the verb, if not, the sentence is a useless sentence, and not further analyzing, before the verb, if a person name, a person name pronoun and a name-like word or unit appear, preliminarily determining the word as a subject, and if a candidate object after the verb is not connected by parallel words or is not a pause number, keeping one farthest from the verb as a candidate keyword of the object, and deleting the rest.
If a plurality of subjects, a plurality of objective aspects and a plurality of objects exist, combining the subjects, the objective aspects and the objects, and completely reserving the subjects, the objective aspects and the objects; if there is no subject in the short sentence, the verb is found in the front, the one farthest from the verb is used as the subject, if not, the verb is deleted from the objective aspect candidate, and the subject behind the verb is also deleted from the candidate subject, so that a triple can be formed, if there is no subject, a double is formed, wherein the double is relatively special and cannot be said to be the subject, only because the subject and the objective aspect are simultaneously appeared in one verb, resulting in no candidate subject behind the verb, and actually the objective aspect already includes the subject, such as "drug vending", which is a verb as the objective aspect and does not appear behind the subject, but the subject "drug" is already implicit in the objective aspect.
And making the extracted name and age into a dictionary data structure, facilitating inquiry and access when in use, and judging whether the subject reaches the legal age or not and whether criminal responsibility is required or not. And the extracted date and the extracted objective aspect are made into a dictionary data structure, so that the dictionary data structure is convenient to inquire and access when in use, and whether the action is earlier than the effective time of the criminal name or not is judged, and further whether criminal responsibility needs to be held or not is judged. The establishment of the criminal name is judged according to the effective time and the legal age of the criminal name, so that the criminal name is judged more accurately.
Fig. 2 is a flowchart illustrating an information extraction method according to an embodiment. It should be understood that, although the steps in the flowchart of fig. 2 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 2 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
In one embodiment, as shown in fig. 3, there is provided an information extraction apparatus including:
a text acquisition module 201, configured to acquire a crime fact text;
a candidate keyword extraction module 202, configured to extract candidate keywords corresponding to the crime name requirement of the crime fact text;
the keyword reorganization module 203 is configured to obtain position information of candidate keywords corresponding to the criminal requirement, and reorganize the candidate keywords according to the position information of the candidate keywords to obtain a reorganized speech segment;
and the target keyword extraction module 204 is configured to extract target keywords corresponding to the crime name key in the restructured speech segment, where the target keywords are used to pre-judge a crime name corresponding to the crime fact text.
In one embodiment, the candidate keyword extraction module includes:
and the word segmentation unit is used for inputting the crime fact text into the deep learning network model, and performing word segmentation on the crime fact text through the deep learning network model to obtain a plurality of word segmentation results.
And the candidate keyword screening unit is used for screening each word segmentation result to obtain candidate keywords corresponding to the crime key terms.
In one embodiment, the candidate keyword extraction module further includes:
the system comprises an information acquisition unit, a crime condition analysis unit and a crime information analysis unit, wherein the information acquisition unit is used for acquiring part-of-speech tagging information corresponding to each vocabulary of a crime fact text and corresponding position information, and the crime condition comprises a subject, an object, a subjective aspect and an objective aspect.
And the object candidate keyword determining unit is used for screening out a first vocabulary with the part of speech tagging information as a noun and corresponding position information according to the part of speech tagging information, and taking the first vocabulary as a candidate keyword corresponding to the object.
And the objective candidate keyword determining unit is used for screening out a second vocabulary with part-of-speech tagging information being verbs and corresponding position information according to the part-of-speech tagging information, and taking the second vocabulary as the candidate keyword in the objective aspect.
In an embodiment, the candidate keyword extraction module is further configured to obtain a delinquent dictionary, filter words corresponding to the crime fact text according to the delinquent dictionary to obtain delinquent words, and use the delinquent words as the candidate keywords in the subjective aspect.
In one embodiment, the target keyword extraction module includes:
a verb judgment unit, configured to judge whether a candidate keyword with part of speech labeled as a verb exists in each sub-paragraph in a restructured corpus, where the restructured corpus includes a plurality of sub-paragraphs;
a subject target keyword determination unit, configured to, when a verb exists in the sub-paragraph, take a name, a person-named pronoun, or a unit name located before the verb in the sub-paragraph as a target keyword of the subject;
the object relation judging unit is used for judging whether the objects positioned behind the verbs in the subsegments comprise a plurality of objects, and judging whether the plurality of objects belong to a parallel relation when the objects comprise a plurality of objects;
and the object target keyword determining unit is used for taking the plurality of objects as the target objects when the plurality of objects belong to the parallel relationship, taking the object which is farthest away from the verb in the sub-paragraph as the target object and taking the candidate keyword corresponding to the target object as the target keyword when the plurality of objects do not belong to the parallel relationship.
In one embodiment, the information extraction apparatus further includes:
the four-element combination module is used for combining a plurality of subjects, a plurality of objects and a plurality of objective aspects to obtain at least one combination when the subjects, the objects or the objective aspects exist, and the combination at least comprises the subjects and the corresponding objective aspects.
In one embodiment, the information extraction apparatus further includes:
and the criminal name judging module is used for screening the character name and the character age from the target keyword corresponding to the main body, judging whether the main body reaches the legal criminal appraising age according to the character age, extracting the action date from the objective aspect when the legal appraising age is met, judging whether the action date is positioned before the criminal name effective date, and judging that the criminal name of the main body is established when the action date is positioned after the criminal name effective date.
FIG. 4 is a diagram illustrating an internal structure of a computer device in one embodiment. The computer device may specifically be the terminal 110 (or the server 120) in fig. 1. As shown in fig. 4, the computer apparatus includes a processor, a memory, a network interface, an input device, and a display screen connected through a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program that, when executed by the processor, causes the processor to implement the information extraction method. The internal memory may also have a computer program stored therein, which when executed by the processor, causes the processor to perform the information extraction method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 4 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, the information extraction apparatus provided in the present application may be implemented in the form of a computer program, and the computer program may be run on a computer device as shown in fig. 4. The memory of the computer device may store various program modules constituting the information extraction apparatus, such as a text acquisition module 201, a candidate keyword extraction module 202, a keyword recomposition module 203, and a target keyword extraction module 204 shown in fig. 3. The computer program constituted by the respective program modules causes the processor to execute the steps in the information extraction method of the respective embodiments of the present application described in the present specification.
For example, the computer device shown in fig. 4 may perform the acquisition of the crime fact text by the text acquisition module 201 in the information extraction apparatus shown in fig. 3. The computer device may perform the candidate keyword extraction module 202 to extract candidate keywords corresponding to the crime name requirement of the crime fact text. The computer equipment can execute the steps of obtaining the position information of the candidate keywords corresponding to the criminal name key element through the keyword reorganization module 203, and reorganizing the candidate keywords according to the position information of the candidate keywords to obtain the reorganized speech segment. The computer device can extract target keywords corresponding to the crime name requirement in the restructured speech segment through the target keyword extraction module 204, and the target keywords are used for pre-judging the crime names corresponding to the crime fact texts.
In one embodiment, a computer device is provided, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program: the method comprises the steps of obtaining a crime fact text, extracting candidate keywords corresponding to a crime name requirement of the crime fact text, obtaining position information of the candidate keywords corresponding to the crime name requirement, recombining the candidate keywords according to the position information of the candidate keywords to obtain a recombined speech segment, extracting target keywords corresponding to the crime name requirement in the recombined speech segment, and using the target keywords to pre-judge a crime name corresponding to the crime fact text.
In one embodiment, extracting candidate keywords corresponding to the crime name element of the crime fact text comprises: the criminal fact text is input into a deep learning network model, the criminal fact text is segmented through the deep learning network model to obtain a plurality of segmentation results, and each segmentation result is screened to obtain candidate keywords corresponding to the criminal name requirement.
In one embodiment, the criminal name element includes a subject, an object, a subjective aspect and an objective aspect, and the candidate keywords corresponding to the criminal name element of the criminal fact text are extracted, further comprising: the method comprises the steps of obtaining part-of-speech tagging information and corresponding position information corresponding to each word of a crime fact text, screening out a first word and corresponding position information of which the part-of-speech tagging information is a noun according to the part-of-speech tagging information, taking the first word as a candidate keyword corresponding to an object, screening out a second word and corresponding position information of which the part-of-speech tagging information is a verb according to the part-of-speech tagging information, and taking the second word as a candidate keyword in an objective aspect.
In one embodiment, the processor, when executing the computer program, further performs the steps of: and acquiring a delinquent dictionary, screening words corresponding to the crime fact text according to the delinquent dictionary to obtain delinquent words, and taking the delinquent words as candidate keywords in the subjective aspect.
In one embodiment, the reformulation term segment includes a plurality of sub-segments, and extracting a target keyword corresponding to a crime name element in the reformulation term segment includes: judging whether candidate keywords with parts of speech marked as verbs exist in each sub-paragraph in the restructured language paragraph, when verbs exist in the sub-paragraphs, taking the names, the person names or the unit names before the verbs in the sub-paragraphs as target keywords of a subject, judging whether a plurality of objects behind the verbs in the sub-paragraphs are included, when the plurality of objects are included, judging whether the plurality of objects belong to a parallel relationship, when the plurality of objects belong to the parallel relationship, taking the plurality of objects as the target objects, when the plurality of objects do not belong to the parallel relationship, taking the object which is farthest away from the verbs in the sub-paragraphs as the target object, and taking the candidate keywords corresponding to the target object as the target keywords.
In one embodiment, the processor, when executing the computer program, further performs the steps of: when a plurality of subjects, a plurality of objects and a plurality of objective aspects exist, the subjects, the objects and the objective aspects are combined to obtain at least one combination, and the combination at least comprises the subjects and the corresponding objective aspects.
In one embodiment, the processor, when executing the computer program, further performs the steps of: screening the character name and the character age from the target keyword corresponding to the main body, judging whether the main body reaches the legal criminal appraising age according to the character age corresponding to the character name, extracting the action date from the objective aspect when the legal criminal appraising age is met, judging whether the action date is positioned before the effective date of the criminal name, and judging that the criminal name of the main body corresponding to the character name is established when the action date is positioned in the effective date of the criminal name.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of: the method comprises the steps of obtaining a crime fact text, extracting candidate keywords corresponding to a crime name requirement of the crime fact text, obtaining position information of the candidate keywords corresponding to the crime name requirement, recombining the candidate keywords according to the position information of the candidate keywords to obtain a recombined speech segment, extracting target keywords corresponding to the crime name requirement in the recombined speech segment, and using the target keywords to pre-judge a crime name corresponding to the crime fact text.
In one embodiment, extracting candidate keywords corresponding to the crime name element of the crime fact text comprises: the criminal fact text is input into a deep learning network model, the criminal fact text is segmented through the deep learning network model to obtain a plurality of segmentation results, and each segmentation result is screened to obtain candidate keywords corresponding to the criminal name requirement.
In one embodiment, the criminal name element includes a subject, an object, a subjective aspect and an objective aspect, and the candidate keywords corresponding to the criminal name element of the criminal fact text are extracted, further comprising: the method comprises the steps of obtaining part-of-speech tagging information and corresponding position information corresponding to each word of a crime fact text, screening out a first word and corresponding position information of which the part-of-speech tagging information is a noun according to the part-of-speech tagging information, taking the first word as a candidate keyword corresponding to an object, screening out a second word and corresponding position information of which the part-of-speech tagging information is a verb according to the part-of-speech tagging information, and taking the second word as a candidate keyword in an objective aspect.
In one embodiment, the computer program when executed by the processor further performs the steps of: and acquiring a delinquent dictionary, screening words corresponding to the crime fact text according to the delinquent dictionary to obtain delinquent words, and taking the delinquent words as candidate keywords in the subjective aspect.
In one embodiment, the reformulation term segment includes a plurality of sub-segments, and extracting a target keyword corresponding to a crime name element in the reformulation term segment includes: judging whether candidate keywords with parts of speech marked as verbs exist in each sub-paragraph in the restructured language paragraph, when verbs exist in the sub-paragraphs, taking the names, the person names or the unit names before the verbs in the sub-paragraphs as target keywords of a subject, judging whether a plurality of objects behind the verbs in the sub-paragraphs are included, when the plurality of objects are included, judging whether the plurality of objects belong to a parallel relationship, when the plurality of objects belong to the parallel relationship, taking the plurality of objects as the target objects, when the plurality of objects do not belong to the parallel relationship, taking the object which is farthest away from the verbs in the sub-paragraphs as the target object, and taking the candidate keywords corresponding to the target object as the target keywords.
In one embodiment, the computer program when executed by the processor further performs the steps of: when a plurality of subjects, a plurality of objects and a plurality of objective aspects exist, the subjects, the objects and the objective aspects are combined to obtain at least one combination, and the combination at least comprises the subjects and the corresponding objective aspects.
In one embodiment, the computer program when executed by the processor further performs the steps of: screening the character name and the character age from the target keyword corresponding to the main body, judging whether the main body reaches the legal criminal appraising age according to the character age corresponding to the character name, extracting the action date from the objective aspect when the legal criminal appraising age is met, judging whether the action date is positioned before the effective date of the criminal name, and judging that the criminal name of the main body corresponding to the character name is established when the action date is positioned in the effective date of the criminal name.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present invention, which enable those skilled in the art to understand or practice the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. An information extraction method, the method comprising:
acquiring a crime fact text;
extracting candidate keywords corresponding to the crime name key of the crime fact text;
acquiring position information of candidate keywords corresponding to the criminal name requirement, and recombining the candidate keywords according to the position information of the candidate keywords to obtain a recombined speech segment;
and extracting target keywords corresponding to the criminal name key element in the restructuring language segment, wherein the target keywords are used for prejudging the criminal name corresponding to the criminal fact text.
2. The method of claim 1, wherein the extracting candidate keywords corresponding to the crime name element of the crime fact text comprises:
inputting the crime fact text into a deep learning network model, and segmenting the crime fact text through the deep learning network model to obtain a plurality of segmentation results;
and screening each word segmentation result to obtain candidate keywords corresponding to the crime name key element.
3. The method of claim 1, wherein the criminal name element comprises a subject, an object, a subjective aspect and an objective aspect, and the extracting of the candidate keyword corresponding to the criminal name element of the criminal fact text further comprises:
acquiring part-of-speech tagging information and corresponding position information corresponding to each vocabulary of the crime fact text;
screening out a first vocabulary with the part-of-speech tagging information being a noun and corresponding position information according to the part-of-speech tagging information, and taking the first vocabulary as a candidate keyword corresponding to the object;
and screening out a second vocabulary with the part-of-speech tagging information being verbs and corresponding position information according to the part-of-speech tagging information, and taking the second vocabulary as the candidate keyword in the objective aspect.
4. The method of claim 3, further comprising:
acquiring a delinquent dictionary;
and screening words corresponding to the crime fact text according to the delinquent dictionary to obtain delinquent words, and taking the delinquent words as candidate keywords of the subjective aspect.
5. The method according to claim 3, wherein the reformulation term segment comprises a plurality of sub-segments, and the extracting the target keyword corresponding to the guilty name element in the reformulation term segment comprises:
judging whether candidate keywords with parts of speech marked as verbs exist in each sub-paragraph in the recombined sentence segments or not;
when a verb exists in the sub-paragraph, taking the name, the person name or the unit name which is positioned before the verb in the sub-paragraph as the target keyword of the main body;
judging whether the objects behind the verbs in the subsection drop comprise a plurality of objects, and judging whether the objects belong to a parallel relation when the objects comprise a plurality of objects;
when the objects belong to the parallel relation, taking a plurality of the objects as target objects;
when the object does not belong to the parallel relation, taking the object which is farthest away from the verb in the sub-paragraph as the target object;
and taking the candidate keywords corresponding to the target object as the target keywords.
6. The method of claim 3, further comprising:
when a plurality of subjects, a plurality of objects and a plurality of objective aspects are present, combining the plurality of subjects, the plurality of objects and the plurality of objective aspects to obtain at least one combination, wherein the combination at least comprises the subject and the corresponding objective aspects.
7. The method of claim 3, further comprising:
screening a person name and a person age from a target keyword corresponding to the main body, and judging whether the main body reaches the legal criminal judging age or not according to the person age corresponding to the person name;
and when the statutory criminal judging age is met, extracting an action date from the objective aspect, judging whether the action date is positioned before the date of the valid criminal name, and when the action date is positioned in the period of the valid criminal name, judging that the criminal name of the main body corresponding to the figure name is established.
8. An information extraction apparatus, characterized in that the apparatus comprises:
the text acquisition module is used for acquiring a crime fact text;
the candidate keyword extraction module is used for extracting candidate keywords corresponding to the crime name key of the crime fact text;
the keyword recombination module is used for acquiring the position information of a candidate keyword corresponding to the criminal name requirement, and recombining the candidate keyword according to the position information of the candidate keyword to obtain a recombined speech segment;
and the target keyword extraction module is used for extracting target keywords corresponding to the crime name key in the recombined language segment, and the target keywords are used for prejudging the crime names corresponding to the crime fact texts.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 7 are implemented when the computer program is executed by the processor.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN201910143416.1A 2019-02-26 2019-02-26 Information extraction method and device, computer equipment and storage medium Pending CN111611340A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910143416.1A CN111611340A (en) 2019-02-26 2019-02-26 Information extraction method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910143416.1A CN111611340A (en) 2019-02-26 2019-02-26 Information extraction method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111611340A true CN111611340A (en) 2020-09-01

Family

ID=72202985

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910143416.1A Pending CN111611340A (en) 2019-02-26 2019-02-26 Information extraction method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111611340A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112434158A (en) * 2020-11-13 2021-03-02 北京创业光荣信息科技有限责任公司 Enterprise label acquisition method and device, storage medium and computer equipment
CN114021563A (en) * 2021-11-19 2022-02-08 浙江太美医疗科技股份有限公司 Method, device, equipment and storage medium for extracting data in medical information

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010092102A (en) * 2008-10-03 2010-04-22 Koji Ishibashi Information presentation method, information presentation program, computer readable recording medium, and information presentation device
CN105426360A (en) * 2015-11-12 2016-03-23 中国建设银行股份有限公司 Keyword extracting method and device
CN108563703A (en) * 2018-03-26 2018-09-21 北京北大英华科技有限公司 A kind of determination method of charge, device and computer equipment, storage medium
CN109213864A (en) * 2018-08-30 2019-01-15 广州慧睿思通信息科技有限公司 Criminal case anticipation system and its building and pre-judging method based on deep learning
CN109325226A (en) * 2018-09-10 2019-02-12 广州杰赛科技股份有限公司 Term extraction method, apparatus and storage medium based on deep learning network
CN109376963A (en) * 2018-12-10 2019-02-22 杭州世平信息科技有限公司 A kind of criminal case charge law article unified prediction neural network based
CN109376230A (en) * 2018-12-18 2019-02-22 广东博维创远科技有限公司 Crime is determined a crime prediction technique, system, storage medium and server

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010092102A (en) * 2008-10-03 2010-04-22 Koji Ishibashi Information presentation method, information presentation program, computer readable recording medium, and information presentation device
CN105426360A (en) * 2015-11-12 2016-03-23 中国建设银行股份有限公司 Keyword extracting method and device
CN108563703A (en) * 2018-03-26 2018-09-21 北京北大英华科技有限公司 A kind of determination method of charge, device and computer equipment, storage medium
CN109213864A (en) * 2018-08-30 2019-01-15 广州慧睿思通信息科技有限公司 Criminal case anticipation system and its building and pre-judging method based on deep learning
CN109325226A (en) * 2018-09-10 2019-02-12 广州杰赛科技股份有限公司 Term extraction method, apparatus and storage medium based on deep learning network
CN109376963A (en) * 2018-12-10 2019-02-22 杭州世平信息科技有限公司 A kind of criminal case charge law article unified prediction neural network based
CN109376230A (en) * 2018-12-18 2019-02-22 广东博维创远科技有限公司 Crime is determined a crime prediction technique, system, storage medium and server

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112434158A (en) * 2020-11-13 2021-03-02 北京创业光荣信息科技有限责任公司 Enterprise label acquisition method and device, storage medium and computer equipment
CN112434158B (en) * 2020-11-13 2024-05-28 海创汇科技创业发展股份有限公司 Enterprise tag acquisition method, enterprise tag acquisition device, storage medium and computer equipment
CN114021563A (en) * 2021-11-19 2022-02-08 浙江太美医疗科技股份有限公司 Method, device, equipment and storage medium for extracting data in medical information

Similar Documents

Publication Publication Date Title
CN109858010B (en) Method and device for recognizing new words in field, computer equipment and storage medium
US20180181544A1 (en) Systems for Automatically Extracting Job Skills from an Electronic Document
Lloret et al. Analyzing the capabilities of crowdsourcing services for text summarization
CN111046142A (en) Text examination method and device, electronic equipment and computer storage medium
US10496751B2 (en) Avoiding sentiment model overfitting in a machine language model
Flekova et al. What makes a good biography? Multidimensional quality analysis based on Wikipedia article feedback data
CN112818093A (en) Evidence document retrieval method, system and storage medium based on semantic matching
Grant The idea of progress in forensic authorship analysis
CN110929520A (en) Non-named entity object extraction method and device, electronic equipment and storage medium
CN111611340A (en) Information extraction method and device, computer equipment and storage medium
Panchenko et al. Detection of child sexual abuse media on p2p networks: Normalization and classification of associated filenames
Derbas et al. Eventfully safapp: hybrid approach to event detection for social media mining
Thorleuchter et al. Quantitative cross impact analysis with latent semantic indexing
CN113254651B (en) Method and device for analyzing referee document, computer equipment and storage medium
Khairova et al. Estimating the quality of articles in Russian Wikipedia using the logical-linguistic model of fact extraction
CN109992651A (en) A kind of problem target signature automatic identification and abstracting method
Rahmi Dewi et al. Software Requirement-Related Information Extraction from Online News using Domain Specificity for Requirements Elicitation: How the system analyst can get software requirements without constrained by time and stakeholder availability
CN111191413B (en) Method, device and system for automatically marking event core content based on graph sequencing model
US20230186212A1 (en) System, method, electronic device, and storage medium for identifying risk event based on social information
CN110888940B (en) Text information extraction method and device, computer equipment and storage medium
CN111723870A (en) Data set acquisition method, device, equipment and medium based on artificial intelligence
CN111881695A (en) Audit knowledge retrieval method and device
da Rocha et al. A text as unique as a fingerprint: Text analysis and authorship recognition in a Virtual Learning Environment of the Unified Health System in Brazil
Shahbaz et al. Sentiment miner: A prototype for sentiment analysis of unstructured data and text
Suriyachay et al. Thai named entity tagged corpus annotation scheme and self verification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 510000 no.2-8, North Street, Nancun Town, Panyu District, Guangzhou City, Guangdong Province

Applicant after: Guangzhou huiruisitong Technology Co.,Ltd.

Address before: 510000 no.2-8, North Street, Nancun Town, Panyu District, Guangzhou City, Guangdong Province

Applicant before: GUANGZHOU HUIRUI SITONG INFORMATION TECHNOLOGY Co.,Ltd.

CB02 Change of applicant information