CN108427661A - A kind of new big data label manufacturing process and device - Google Patents

A kind of new big data label manufacturing process and device Download PDF

Info

Publication number
CN108427661A
CN108427661A CN201810223467.0A CN201810223467A CN108427661A CN 108427661 A CN108427661 A CN 108427661A CN 201810223467 A CN201810223467 A CN 201810223467A CN 108427661 A CN108427661 A CN 108427661A
Authority
CN
China
Prior art keywords
label
accused
target object
track
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810223467.0A
Other languages
Chinese (zh)
Inventor
邱晓贤
林国强
章武盛
周义豪
罗以攀
黄文杰
赵亨利
王松林
袁伟林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
GUANGZHOU HUIZHI COMMUNICATION TECHNOLOGY CO LTD
Original Assignee
GUANGZHOU HUIZHI COMMUNICATION TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by GUANGZHOU HUIZHI COMMUNICATION TECHNOLOGY CO LTD filed Critical GUANGZHOU HUIZHI COMMUNICATION TECHNOLOGY CO LTD
Priority to CN201810223467.0A priority Critical patent/CN108427661A/en
Publication of CN108427661A publication Critical patent/CN108427661A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/117Tagging; Marking up; Designating a block; Setting of attributes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the invention discloses a kind of new big data label manufacturing process and devices, wherein this method includes:Obtain the data to be analyzed for being accused of target object;The fact that extract data to be analyzed label, truth labels include the attribute for being accused of target object, behavior, relationship and track;Truth labels are excavated, model label is obtained, model label includes the statistics label for being accused of target object, element label, languages label and adjoint label;Model label is inputted in preset model, the prediction label for being accused of target object is obtained.The present invention is based on ontological theoretical (entity relationship labels) and human brain to recognize the process in the world as foundation, mass data source to be accused of target object is support, it is tool using forward position algorithm, the excavation construction for realizing label system is detectd for public security skill and constructs a set of new label system.

Description

A kind of new big data label manufacturing process and device
Technical field
The present invention relates to skills to detect technical field more particularly to a kind of new big data label manufacturing process and device.
Background technology
At present public security skill detect industry realized to resource data physics recombination or logical mappings by way of carry out again Classification forms value information library, forms the eight big library such as personnel, article, group, behavior, track, relationship, region, case.It is original Data source include operator's spectroscopy data, fence, operator's electricity look into, consignment, data of opening an account and public security net resource etc.. The data source of label detects eight major class resources of element resources bank with skill, needs hierarchical category structures according to label system by it It is mapped.
The system of the prior art lays particular emphasis on advertisement, electric business, the internet commercial system of content class more, with social safety public security The label body that skill detects field ties up to existing industry also in single model, scattered excavation, the dispersion of excalation, not yet architectonical State.Therefore, it is detectd in field in skill, also lacks a kind of method accurately generating label comprehensively.
Invention content
An embodiment of the present invention provides a kind of new big data label manufacturing process and device, detectd for public security skill provide it is a set of New label system.
According to an aspect of the present invention, a kind of new big data label manufacturing process is provided, including:
Obtain the data to be analyzed for being accused of target object;
The fact that extract the data to be analyzed label, the truth labels include the attribute for being accused of target object, Behavior, relationship and track;
The truth labels are excavated, model label is obtained, the model label is accused of target object including described Statistics label, element label, languages label and with label;
The model label is inputted in preset model, the prediction label for being accused of target object is obtained.
Preferably, it is described according to the truth labels carry out excavate specifically include:
S1:It will be preset in the content of text and language library of the attribute for being accused of target object, behavior, relationship and track Text is matched, and the languages label for being accused of target object is obtained;
S2:Elements recognition is carried out to the attribute for being accused of target object, behavior, relationship and the content of text of track, is obtained To the element label for being accused of target object;
S3:Statistical calculation is carried out to the number of the behavior for being accused of target object, obtains described being accused of target object Count label;
S4:Obtain and be accused of the relevant supporter of target object and/or with article with described, determine the supporter and/ Or the track of adjoint article is the second track, the track for being accused of target object described in determination is the first track, according to described first It is accused of the adjoint label of target object described in the determination of the goodness of fit of track and second track;
S5:It will be accused of described in the statistics label, the element label, the languages label and the adjoint label composition The model label of target object.
Preferably, a kind of new big data label manufacturing process provided by the invention further includes:
It is accused of the customized label of target object described in acquisition, the customized label includes described being accused of target object Custom Attributes, self-defined behavior, self-defined relationship and self-defined track;
Select the customized label and the distinct part flag update of the truth labels to the truth labels In.
Preferably, the preset model is naive Bayesian or Logic Regression Models or Method Using Relevance Vector Machine model or random gloomy Woods model.
According to another aspect of the present invention, a kind of new big data label process units is provided, including:
First acquisition module, for obtaining the data to be analyzed for being accused of target object;
Extraction module, the label of the fact that for extracting the data to be analyzed, the truth labels include described being accused of mesh Mark attribute, behavior, relationship and the track of object;
Module is excavated, for being excavated to the truth labels, obtains model label, the model label includes described It is accused of statistics label, element label, languages label and the adjoint label of target object;
Evaluation module obtains the prediction for being accused of target object for inputting the model label in preset model Label.
Preferably, the excavation module specifically includes:
Matching unit, for by the content of text and language of the attribute for being accused of target object, behavior, relationship and track Preset text in library is matched, and the languages label for being accused of target object is obtained;
Extraction unit, for being wanted to the attribute for being accused of target object, behavior, relationship and the content of text of track Element extraction obtains the element label for being accused of target object;
Statistic unit carries out statistical calculation for the number to the behavior for being accused of target object, obtains described be accused of The statistics label of target object;
Determination unit is accused of the relevant supporter of target object and/or with article for obtaining with described, determine described in The track of supporter and/or adjoint article is the second track, and the track for being accused of target object described in determination is the first track, according to It is accused of the adjoint label of target object described in the determination of the goodness of fit of first track and second track;
Component units are used for the statistics label, the element label, the languages label and the adjoint set of tags At the model label for being accused of target object.
Preferably, a kind of new big data label process units provided by the invention further includes:
Second acquisition module, for obtaining the customized label for being accused of target object, the customized label includes The Custom Attributes for being accused of target object, self-defined behavior, self-defined relationship and self-defined track;
Update module, for selecting the customized label and the distinct part flag update of the truth labels extremely In the truth labels.
Preferably, the preset model is naive Bayesian or Logic Regression Models or Method Using Relevance Vector Machine model or random gloomy Woods model.
According to another aspect of the present invention, a kind of new big data label process units is provided, including:Memory and coupling It is connected to the processor of the memory;
The processor is configured as, based on the instruction being stored in the memory devices, executing as described above new Big data label manufacturing process.
According to another aspect of the present invention, a kind of computer-readable medium is provided, computer program is stored thereon with, the journey Above-described new big data label manufacturing process is realized when sequence is executed by processor.
As can be seen from the above technical solutions, the embodiment of the present invention has the following advantages:
An embodiment of the present invention provides a kind of new big data label manufacturing process and devices, wherein this method includes:It obtains It is accused of the data to be analyzed of target object;The fact that extract data to be analyzed label, truth labels include being accused of target object Attribute, behavior, relationship and track;Truth labels are excavated, model label is obtained, model label includes being accused of target object Statistics label, element label, languages label and with label;Model label is inputted in preset model, obtains being accused of target The prediction label of object.The present invention is based on the processes that ontological theoretical (entity-relation-label) and human brain recognize the world As foundation, the mass data source to be accused of target object is support, is tool using forward position algorithm, realizes the digging of label system Pick is built, and is detectd for public security skill and is constructed a set of new label system.
Description of the drawings
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention without having to pay creative labor, may be used also for those of ordinary skill in the art To obtain other attached drawings according to these attached drawings.
Fig. 1 is a kind of flow signal of one embodiment of new big data label manufacturing process provided in an embodiment of the present invention Figure;
Fig. 2 is a kind of flow signal of another embodiment of new big data label manufacturing process provided in an embodiment of the present invention Figure;
Fig. 3 is a kind of structural representation of one embodiment of new big data label process units provided in an embodiment of the present invention Figure.
Specific implementation mode
An embodiment of the present invention provides a kind of new big data label manufacturing process and device, detectd for public security skill provide it is a set of New label system.
In order to make the invention's purpose, features and advantages of the invention more obvious and easy to understand, below in conjunction with the present invention Attached drawing in embodiment, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that disclosed below Embodiment be only a part of the embodiment of the present invention, and not all embodiment.Based on the embodiments of the present invention, this field All other embodiment that those of ordinary skill is obtained without making creative work, belongs to protection of the present invention Range.
Referring to Fig. 1, a kind of one embodiment of new big data label manufacturing process provided by the invention, including:
101, the data to be analyzed for being accused of target object are obtained;
The fact that 102, extract data to be analyzed label, truth labels include the attribute for being accused of target object, behavior, relationship The track and;
103, truth labels are excavated, obtains model label, model label includes the statistics mark for being accused of target object Label, element label, languages label and adjoint label;
104, model label is inputted in preset model, obtains the prediction label for being accused of target object.
An embodiment of the present invention provides a kind of new big data label manufacturing process, including:Waiting for for target object is accused of in acquisition Analyze data;The fact that extract data to be analyzed label, truth labels include the attribute for being accused of target object, behavior, relationship and Track;Truth labels are excavated, model label is obtained, model label includes the statistics label for being accused of target object, element Label, languages label and adjoint label;Model label is inputted in preset model, the prediction label for being accused of target object is obtained. The present invention is based on the processes in ontological theoretical (entity-relation-label) and the human brain cognition world as foundation, to be accused of The mass data source of target object is support, is tool using forward position algorithm, realizes the excavation construction of label system, is public security skill It detects and constructs a set of new label system.
It is that a kind of one embodiment of new big data label manufacturing process carries below to carry out more specific description above For a kind of another embodiment of new big data label manufacturing process, referring to Fig. 2, a kind of new big data mark provided by the invention Another embodiment of production method is signed, including:
201, the data to be analyzed for being accused of target object are obtained;
In the present embodiment, the data to be analyzed for being accused of target object are derived from element resources bank, the original number in resources bank According to source include operator's spectroscopy data, fence, operator's electricity look into, consignment, data of opening an account and public security net resource etc..This is wanted Data reclassify forming value information library by way of physics recombination or logical mappings in plain resources bank, are formd Including the eight big information bank such as personnel, article, group, behavior, track, relationship, region, case.Therefore, it is analyzed when confirmation needs When target object, it can be obtained in resources bank and the related data of target object.
The fact that 202, extract data to be analyzed label, truth labels include the attribute for being accused of target object, behavior, relationship The track and;
Pair based on the data to be analyzed for being accused of target object, classified to data content and form corresponding mark, i.e., Data based on the fact that dimension extraction and conclusion.The ID card information of the object can be referred to by being such as accused of the attribute of target object, Behavior can refer to the object sent within certain time information, call behavior etc., relationship can refer to the friend of the object, relatives Etc. suchlike relational network, track can refer to the object and appear in some geographical location at some time point or at certain section The path etc. of interior trip.
203, the customized label for being accused of target object is obtained, customized label includes the self-defined category for being accused of target object Property, self-defined behavior, self-defined relationship and self-defined track;
204, in selection customized label and the distinct part flag update to truth labels of truth labels;
It, may in information due to the true label not necessarily entirely accurate that step 202 determines in step 203 and 204 It will appear careless mistake and error.Therefore, it can be updated by the customized label of target object.It is understood that should be certainly Define the label information that label is inputted from target object or other users.
If user Zhang San (zhangs) imports a collection of number from Excel file by the page, user (relates to the lot number code Dislike target object) behavior stamp " regular burglary " label.After having played the label, if the present invention carries out the lot number code It determines in the fact that obtain label, determines that the lot number code is " behavior is normal " number, then by " regular burglary " label pair " behavior is normal " label of the lot number code should be replaced.
For another example the address in the element label of Li Si is changed, new supplemented with it in the customized label of Li Si's input Address, then address in element label is modified.Similarly, if some truth labels existence information of Li Si is omitted, also may be used It is filled by the customized label of Li Si.
It should be noted that the customized label of some user is only visible to the user.Customized label and common label Various filterings, union, the operations such as difference set can equally be done.
After determining the final true label of target object, truth labels can be carried out by many algorithms model further Induction and conclusion, obtain the model label for being accused of target object.
It is divided into Four processes, i.e. step 205 to step it should be noted that being excavated to obtain model label to truth labels Rapid 208, this Four processes execution sequence can be carried out at the same time regardless of front and back.
205, the preset text in the content of text and language library of the attribute of target object, behavior, relationship and track will be accused of This is matched, and the languages label for being accused of target object is obtained;
It, can be in its attribute, behavior, relationship and the content of text of track in order to obtain the common language form of target object In analyzed, i.e., content of text is matched with preset language, can be by the language kind of successful match if successful match Class is included in the common language of target object.Such as in the multiple cellphone information of Zhang San, after its information content is matched, find Its information content is repeatedly related to tieing up language, can set its usual languages label to dimension language.
206, elements recognition is carried out to attribute, behavior, relationship and the content of text of track of being accused of target object, is related to Dislike the element label of target object;
Element label in the step is determined for the essential information of target object, can pass through the category of target object Property, behavior, relationship and the content of text of track extract to obtain, such as in the attribute of target object, the name of extracting object, Gender etc., in the behavior of target object, extraction and the relevant subject of behavior or object etc. also can be in its trajectory extraction targets pair As the position often occurred or emergent position etc..
207, the number of the behavior to being accused of target object carries out statistical calculation, obtains the statistics mark for being accused of target object Label;
Statistics label is accused of whether target object has suspicious actions for marking, and is such as frequently sent out using mobile phone within some moon It delivers letters breath, the number (number etc. for another example made a phone call) that information can be sent to it counts.
208, it obtains and is accused of the relevant supporter of target object and/or with article, determine supporter and/or accompaniment The track of product is the second track, determines that the track for being accused of target object is the first track, according to the first track and the second track The goodness of fit determines the adjoint label for being accused of target object;
With label primarily to determine with the associated people of target object or object, if target object was in some of certain day Between point from A to B, if its for drive go on a journey, drive the track of vehicle be consistent with the track of the object, because The automobile, can be included in the adjoint mark using in vehicle list and as the target object of the object by this from the track of vehicle Label.Similarly, the adjoint label of target object can be also determined from the supporter of target object.
209, statistics label, element label, languages label and adjoint label composition are accused of to the model label of target object;
It has determined statistics label, element label, languages label and adjoint label, has constituted the model label of target object, Such as frequently change planes card, hide by day and come out at night.
210, model label is inputted in preset model, obtains the prediction label for being accused of target object.
After obtaining model label, then preset model can be inputted using model label as the input quantity of preset model, obtained It is accused of the prediction label of target object.It should be noted that the preset model is trained in advance by a large amount of training data It obtains, i.e., the data for being related to criminal offence by some are trained with some normal datas, and training obtains model can be to mould Type label carries out careful classification (final classification is carried out to target object), is such as first divided into She Kong groups and electricity Zha groups, Electricity Zha groups can be further divided into fraud in insurance fraud, pretend to be customer service, pretend to be acquaintance, pretend to be the classifications such as public security organs.
After obtaining the prediction label of target object, the exportable prediction label simultaneously does corresponding index, to facilitate public security people Member's aspect retrieval.
In the present embodiment, preset model can be naive Bayesian or Logic Regression Models or Method Using Relevance Vector Machine model or Random Forest model.
Label is divided into no level and classification by the embodiment of the present invention, first, facilitating the thousands of labels of management, is allowed at random Label architecture;Second is that dimension does not isolate, it is mutually relevant between label;Three can provide sub-set of tags for tag modeling.
The present invention is to be based on ontology, incorporates machine learning, natural language processing scheduling algorithm, the reality accumulated in conjunction with client It handles a case thinking, has built binding isotherm, algorithm and the Trinitarian label production system system of experience, meanwhile, the present invention can It effectively shares out the work and helps one another, provides the high-performance retrieval of label, low consumption stores, customized label, rights management, and flexible combination is touched The functions such as hit.
Referring to Fig. 3, a kind of one embodiment of new big data label process units provided by the invention, including:
First acquisition module 301, for obtaining the data to be analyzed for being accused of target object;
Extraction module 302, the label of the fact that for extracting data to be analyzed, truth labels include the category for being accused of target object Property, behavior, relationship and track;
Module 303 is excavated, for being excavated to truth labels, obtains model label, model label includes being accused of target Statistics label, element label, languages label and the adjoint label of object;
Evaluation module 304 obtains the prediction label for being accused of target object for inputting model label in preset model.
Further, module 303 is excavated to specifically include:
Matching unit 3031, attribute, behavior, relationship and the content of text of track and language for target object will to be accused of Preset text in library is matched, and the languages label for being accused of target object is obtained;
Extraction unit 3032 is wanted for attribute, behavior, relationship and the content of text of track to being accused of target object Element extraction, obtains the element label for being accused of target object;
Statistic unit 3033, the number for the behavior to being accused of target object carry out statistical calculation, obtain being accused of target The statistics label of object;
Determination unit 3034, for obtaining and being accused of the relevant supporter of target object and/or with article, determine adjoint The track of person and/or adjoint article is the second track, determines that the track for being accused of target object is the first track, according to the first track The adjoint label for being accused of target object is determined with the goodness of fit of the second track;
Component units 3035 are accused of target pair for that will count label, element label, languages label and adjoint label composition The model label of elephant.
Further, a kind of new big data label process units provided by the invention further includes:
Second acquisition module 305, for obtaining the customized label for being accused of target object, customized label includes being accused of mesh Mark Custom Attributes, self-defined behavior, self-defined relationship and the self-defined track of object;
Update module 306, for selecting customized label and the distinct part flag update of truth labels to the fact In label.
Further, preset model is naive Bayesian or Logic Regression Models or Method Using Relevance Vector Machine model or random gloomy Woods model.
A kind of another embodiment of new big data label process units provided by the invention, including:Memory and coupling It is connected to the processor of memory;
Processor is configured as, based on the instruction being stored in memory devices, executing new big data mark as described above Sign production method.
The invention further relates to a kind of computer-readable mediums, are stored thereon with computer program, which is held by processor Above-described new big data label manufacturing process is realized when row.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In several embodiments provided herein, it should be understood that disclosed system, device and method can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit It divides, only a kind of division of logic function, formula that in actual implementation, there may be another division manner, such as multiple units or component It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or The mutual coupling, direct-coupling or communication connection discussed can be the indirect coupling by some interfaces, device or unit It closes or communicates to connect, can be electrical, machinery or other forms.
The unit illustrated as separating component may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, you can be located at a place, or may be distributed over multiple In network element.Some or all of unit therein can be selected according to the actual needs to realize the mesh of this embodiment scheme 's.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, it can also It is that each unit physically exists alone, it can also be during two or more units be integrated in one unit.Above-mentioned integrated list The form that hardware had both may be used in member is realized, can also be realized in the form of SFU software functional unit.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can be stored in a computer read/write memory medium.Based on this understanding, technical scheme of the present invention is substantially The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words It embodies, which is stored in a storage medium, including some instructions are used so that a computer Equipment (can be personal computer, server or the network equipment etc.) executes the complete of each embodiment the method for the present invention Portion or part steps.And storage medium above-mentioned includes:USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can store journey The medium of sequence code.
The above, the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although with reference to before Stating embodiment, invention is explained in detail, it will be understood by those of ordinary skill in the art that:It still can be to preceding The technical solution recorded in each embodiment is stated to modify or equivalent replacement of some of the technical features;And these Modification or replacement, the spirit and scope for various embodiments of the present invention technical solution that it does not separate the essence of the corresponding technical solution.

Claims (10)

1. a kind of new big data label manufacturing process, which is characterized in that including:
Obtain the data to be analyzed for being accused of target object;
The fact that extract the data to be analyzed label, the truth labels include the attribute for being accused of target object, behavior, Relationship and track;
The truth labels are excavated, model label is obtained, the model label includes the system for being accused of target object Count label, element label, languages label and adjoint label;
The model label is inputted in preset model, the prediction label for being accused of target object is obtained.
2. new big data label manufacturing process according to claim 1, which is characterized in that described according to the truth labels Excavate and specifically includes:
S1:By the preset text in the content of text and language library of the attribute for being accused of target object, behavior, relationship and track It is matched, obtains the languages label for being accused of target object;
S2:Elements recognition is carried out to the attribute for being accused of target object, behavior, relationship and the content of text of track, obtains institute State the element label for being accused of target object;
S3:Statistical calculation is carried out to the number of the behavior for being accused of target object, obtains the statistics for being accused of target object Label;
S4:It obtains and is accused of the relevant supporter of target object and/or adjoint article with described, determine the supporter and/or companion Track with article is the second track, and the track for being accused of target object described in determination is the first track, according to first track It is accused of the adjoint label of target object described in goodness of fit determination with second track;
S5:It will be accused of target described in the statistics label, the element label, the languages label and the adjoint label composition The model label of object.
3. new big data label manufacturing process according to claim 1, which is characterized in that further include:
It is accused of the customized label of target object described in acquisition, the customized label includes described being accused of making by oneself for target object Adopted attribute, self-defined behavior, self-defined relationship and self-defined track;
It selects in the customized label and the distinct part flag update to the truth labels of the truth labels.
4. the new big data label manufacturing process according to claims 1 to 3 any one, which is characterized in that described preset Model is naive Bayesian or Logic Regression Models or Method Using Relevance Vector Machine model or Random Forest model.
5. a kind of new big data label process units, which is characterized in that including:
First acquisition module, for obtaining the data to be analyzed for being accused of target object;
Extraction module, the label of the fact that for extracting the data to be analyzed, the truth labels include described being accused of target pair Attribute, behavior, relationship and the track of elephant;
Module is excavated, for being excavated to the truth labels, obtains model label, the model label includes described is accused of Statistics label, element label, languages label and the adjoint label of target object;
Evaluation module obtains the prediction label for being accused of target object for inputting the model label in preset model.
6. new big data label process units according to claim 5, which is characterized in that the excavation module is specifically wrapped It includes:
Matching unit, for will be in the content of text and language library of the attribute that be accused of target object, behavior, relationship and track Preset text matched, obtain the languages label for being accused of target object;
Extraction unit is carried for carrying out element to the attribute for being accused of target object, behavior, relationship and the content of text of track It takes, obtains the element label for being accused of target object;
Statistic unit obtains described being accused of target for carrying out statistical calculation to the number of the behavior for being accused of target object The statistics label of object;
Determination unit is accused of the relevant supporter of target object and/or adjoint article with described for obtaining, is determined described adjoint The track of person and/or adjoint article is the second track, and the track for being accused of target object described in determination is the first track, according to described It is accused of the adjoint label of target object described in the determination of the goodness of fit of first track and second track;
Component units, for the statistics label, the element label, the languages label and the adjoint label to be formed institute State the model label for being accused of target object.
7. new big data label process units according to claim 5, which is characterized in that further include:
Second acquisition module, for obtaining the customized label for being accused of target object, the customized label includes described It is accused of Custom Attributes, self-defined behavior, self-defined relationship and the self-defined track of target object;
Update module, part flag update for selecting the customized label and the truth labels distinct is to described In truth labels.
8. the new big data label process units according to claim 5 to 7 any one, which is characterized in that described preset Model is naive Bayesian or Logic Regression Models or Method Using Relevance Vector Machine model or Random Forest model.
9. a kind of new big data label process units, which is characterized in that including:Memory, and it is coupled to the memory Processor;
The processor is configured as, based on the instruction being stored in the memory devices, executing as Claims 1-4 is arbitrary New big data label manufacturing process described in one.
10. a kind of computer-readable medium, is stored thereon with computer program, which is characterized in that the program is executed by processor New big data label manufacturing process described in Shi Shixian Claims 1-4 any one.
CN201810223467.0A 2018-03-19 2018-03-19 A kind of new big data label manufacturing process and device Pending CN108427661A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810223467.0A CN108427661A (en) 2018-03-19 2018-03-19 A kind of new big data label manufacturing process and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810223467.0A CN108427661A (en) 2018-03-19 2018-03-19 A kind of new big data label manufacturing process and device

Publications (1)

Publication Number Publication Date
CN108427661A true CN108427661A (en) 2018-08-21

Family

ID=63158947

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810223467.0A Pending CN108427661A (en) 2018-03-19 2018-03-19 A kind of new big data label manufacturing process and device

Country Status (1)

Country Link
CN (1) CN108427661A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111260526A (en) * 2020-01-20 2020-06-09 北京明略软件系统有限公司 Figure track behavior analysis and estimation method and device
CN111506776A (en) * 2019-11-08 2020-08-07 马上消费金融股份有限公司 Data labeling method and related device
CN111581300A (en) * 2020-05-09 2020-08-25 山东健康医疗大数据有限公司 Label matrix construction and updating method based on health medical data
CN113361979A (en) * 2021-08-10 2021-09-07 湖南高至科技有限公司 Profile-oriented ontology modeling method and device, computer equipment and storage medium
CN115002200A (en) * 2022-05-31 2022-09-02 平安银行股份有限公司 User portrait based message pushing method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150095333A1 (en) * 2013-09-27 2015-04-02 International Business Machines Corporation Activity Based Analytics
CN106096623A (en) * 2016-05-25 2016-11-09 中山大学 A kind of crime identifies and Forecasting Methodology
CN106127231A (en) * 2016-06-16 2016-11-16 中国人民解放军国防科学技术大学 A kind of crime individual discrimination method based on the information Internet
CN106649824A (en) * 2016-12-29 2017-05-10 东方网力科技股份有限公司 Behavior prediction method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150095333A1 (en) * 2013-09-27 2015-04-02 International Business Machines Corporation Activity Based Analytics
CN106096623A (en) * 2016-05-25 2016-11-09 中山大学 A kind of crime identifies and Forecasting Methodology
CN106127231A (en) * 2016-06-16 2016-11-16 中国人民解放军国防科学技术大学 A kind of crime individual discrimination method based on the information Internet
CN106649824A (en) * 2016-12-29 2017-05-10 东方网力科技股份有限公司 Behavior prediction method and device

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111506776A (en) * 2019-11-08 2020-08-07 马上消费金融股份有限公司 Data labeling method and related device
CN111260526A (en) * 2020-01-20 2020-06-09 北京明略软件系统有限公司 Figure track behavior analysis and estimation method and device
CN111581300A (en) * 2020-05-09 2020-08-25 山东健康医疗大数据有限公司 Label matrix construction and updating method based on health medical data
CN113361979A (en) * 2021-08-10 2021-09-07 湖南高至科技有限公司 Profile-oriented ontology modeling method and device, computer equipment and storage medium
CN115002200A (en) * 2022-05-31 2022-09-02 平安银行股份有限公司 User portrait based message pushing method, device, equipment and storage medium
CN115002200B (en) * 2022-05-31 2023-08-22 平安银行股份有限公司 Message pushing method, device, equipment and storage medium based on user portrait

Similar Documents

Publication Publication Date Title
CN108427661A (en) A kind of new big data label manufacturing process and device
CN109299976A (en) Clicking rate prediction technique, electronic device and computer readable storage medium
CN110209764A (en) The generation method and device of corpus labeling collection, electronic equipment, storage medium
CN108701128A (en) It explains and analysis condition natural language querying
CN106663124A (en) Generating and using a knowledge-enhanced model
CN108763555A (en) Representation data acquisition methods and device based on demand word
CN102930048B (en) Use the data rich found automatically with reference to the semanteme with vision data
Sharma et al. Application of data mining–a survey paper
CN109872162A (en) A kind of air control classifying identification method and system handling customer complaint information
CN105874753A (en) Systems and methods for behavioral segmentation of users in a social data network
CN110197389A (en) A kind of user identification method and device
CN106537387B (en) Retrieval/storage image associated with event
CN107818336A (en) Method and system are recommended in a kind of matching based on city specific crowd and the policy that associates
CN110110213B (en) Method and device for mining user occupation, computer readable storage medium and terminal equipment
US20110225135A1 (en) Patent Search Engine with Statistical Snapshots
CN109726745A (en) A kind of sensibility classification method based on target incorporating description knowledge
CN110309114A (en) Processing method, device, storage medium and the electronic device of media information
CN109271423A (en) A kind of object recommendation method, apparatus, terminal and computer readable storage medium
CN102880631A (en) Chinese author identification method based on double-layer classification model, and device for realizing Chinese author identification method
CN109582792A (en) A kind of method and device of text classification
CN107491536A (en) A kind of examination question method of calibration, examination question calibration equipment and electronic equipment
CN109960719A (en) A kind of document handling method and relevant apparatus
CN115204886A (en) Account identification method and device, electronic equipment and storage medium
CN107357782A (en) One kind identification user's property method for distinguishing and terminal
CN110309355A (en) Generation method, device, equipment and the storage medium of content tab

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180821

RJ01 Rejection of invention patent application after publication