CN110532331A - A kind of method and relevant apparatus that object type is determining - Google Patents
A kind of method and relevant apparatus that object type is determining Download PDFInfo
- Publication number
- CN110532331A CN110532331A CN201910841009.8A CN201910841009A CN110532331A CN 110532331 A CN110532331 A CN 110532331A CN 201910841009 A CN201910841009 A CN 201910841009A CN 110532331 A CN110532331 A CN 110532331A
- Authority
- CN
- China
- Prior art keywords
- sorted
- information
- type
- quality
- text information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/906—Clustering; Classification
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the present application provides a kind of method and relevant apparatus that object type is determining, it is related to artificial intelligence field, object to be sorted is carried out sifting sort by quality judging rule and the first quality classification model by this method, obtains type belonging to the object to be sorted.The embodiment of the present application can identify the type of media with automatic screening, to identify low-quality media, solve the technical issues of can not manually identifying a large amount of low quality media at present.
Description
Technical field
A kind of method determined this application involves field of artificial intelligence more particularly to object type and related dress
It sets.
Background technique
With the development of modern science and technology, the mode of media releasing information is more and more convenient.These media can be flat in network
Register account number on platform is then based on account release information, such as text information, audio-frequency information and video information etc..These matchmakers
Body further includes referring to the fact that ordinary populace issues they itself by the approach such as network outward and news from media from media
Circulation way.The air port of content creation in recent years, major Internet company all actively enters contents marketplace, it is various from media such as
Gush out as emerging rapidly in large numbersBamboo shoots after a spring rain, everybody can by write make oneself from media.Substantial amounts can all be created daily from media
Make the article of magnanimity, but quality is irregular;Due to the driving of interests, can deliver from media with advertisement or crude indiscriminate
The article made, thus it is extremely important to be classified from media.
Existing more single from media evaluation index, majority is to run white list media by manual examination and verification or pass through
User's report closes violation media.
Currently, much will appear a large amount of low-quality articles, image and video from media, to being sieved in this way from media
Choosing generally requires a large amount of manpower and material resources, and human resources are limited, it is difficult to screen so more low quality from media, it is therefore desirable to one
The method of kind automatic screening.
Summary of the invention
The embodiment of the present application provides a kind of method and relevant apparatus that object type is determining, to be sorted right for determining
The type of elephant, so that Automatic sieve selects low quality media.
In a first aspect, the embodiment of the present application provides a kind of method that object type determines, comprising:
Obtain the information aggregate of object to be sorted, wherein the information aggregate is according to first information set and second
What information aggregate generated;
If the information aggregate of the object to be sorted meets quality judging rule, it is determined that the object to be sorted belongs to the
One type object;
If the information aggregate of the object to be sorted does not meet the quality judging rule, according to the object to be sorted
Information aggregate obtain the characteristic information of the object to be sorted;
Classification results corresponding to the characteristic information are obtained by the first quality classification model, wherein described point
Class result is the first classification results or the second classification results, and it is described that first classification results indicate that the object to be sorted belongs to
First kind object, second classification results indicate that the object to be sorted belongs to Second Type object or third type pair
As.
Second aspect, the embodiment of the present application provide a kind of device that object type determines, comprising:
Acquiring unit, for obtaining the information aggregate of object to be sorted, wherein the information aggregate is according to the first information
What set and the second information aggregate generated;
Processing unit, if the information aggregate for the object to be sorted meet quality judging rule, it is determined that it is described to
Object of classification belongs to first kind object;
Processing unit, if the information aggregate for being also used to the object to be sorted does not meet the quality judging rule, root
The characteristic information of the object to be sorted is obtained according to the information aggregate of the object to be sorted;
Processing unit is also used to obtain classification knot corresponding to the characteristic information by the first quality classification model
Fruit, wherein the classification results are the first classification results or the second classification results, and first classification results indicate described wait divide
Class object belongs to the first kind object, and second classification results indicate that the object to be sorted belongs to Second Type object
Or third type object.
In a kind of implementation of the embodiment of the present application second aspect, the processing unit is also used to:
If the classification results are second classification results, obtained by the second quality classification model described to be sorted
The corresponding object vectors of object;
Calculate the similarity of the object to be sorted corresponding object vectors and Second Type object vectors, second class
Type object vectors are the corresponding vector of the Second Type object;
If the similarity is greater than given threshold, it is determined that the object to be sorted belongs to the Second Type object;
If the similarity is less than or equal to the given threshold, it is determined that the object to be sorted belongs to the third class
Type object.
In a kind of implementation of the embodiment of the present application second aspect, the acquiring unit is also used to:
Text information is obtained, the text information and the object to be sorted have corresponding relationship;
It is counted to obtain the first information set of the object to be sorted according to the text information;
History text information is obtained, the history text information and the object to be sorted have corresponding relationship;
The second information aggregate of the object to be sorted is obtained according to the history text Information Statistics.
In a kind of implementation of the embodiment of the present application second aspect, the processing unit is also used to:
If low quality text information accounting is greater than the first accounting threshold value of setting in the information aggregate of the object to be sorted,
Then determine that the object to be sorted belongs to first kind object, the low quality text information accounting is the object pair to be sorted
The low quality text information answered accounts for the accounting of the corresponding text information of the object to be sorted.
In a kind of implementation of the embodiment of the present application second aspect, the processing unit is also used to:
If repeated text information accounting is greater than the second accounting threshold value of setting in the information aggregate of the object to be sorted,
Determine that the object to be sorted belongs to first kind object, the repeated text information accounting is that the object to be sorted is corresponding
The higher text information of duplicate checking rate accounts for the accounting of the corresponding text information of the object to be sorted.
In a kind of implementation of the embodiment of the present application second aspect, the processing unit is also used to:
If the dispatch frequency of object to be sorted described in the information aggregate of the object to be sorted is greater than the frequency threshold of setting
Value, it is determined that the object to be sorted belongs to first kind object, and the dispatch frequency of the object to be sorted is described to be sorted
The frequency of object publishing text information.
In a kind of implementation of the embodiment of the present application second aspect, the processing unit is also used to:
The click sequence of the object to be sorted, the click sequence are obtained according to the information aggregate of the object to be sorted
Mark and the corresponding user identifier of the object to be sorted including the object to be sorted;
According to the click sequence, by the second quality classification model obtain the corresponding object of the object to be sorted to
Amount.
In a kind of implementation of the embodiment of the present application second aspect, the processing unit is also used to:
If the similarity is greater than given threshold, Second Type markup information is obtained;
Determine that the annotation results of the object to be sorted, the annotation results include according to the Second Type markup information
First annotation results and the second annotation results, first annotation results are for indicating that the object to be sorted belongs to described second
Type object, second annotation results are for indicating that the object to be sorted belongs to the third type object.
The third aspect, the embodiment of the present application provide server, comprising:
One or more central processing units, memory, input/output interface, wired or wireless network interface, power supply;
The memory is of short duration storage memory or persistent storage memory;
The central processing unit is configured to communicate with the memory, is executed in the memory on the server
Instruction operation is to execute the method such as first aspect.
As can be seen from the above technical solutions, the embodiment of the present application has the advantage that
The embodiment of the present application provides a kind of method and relevant apparatus that object type is determining, is related to artificial intelligence field,
Object to be sorted is carried out sifting sort by quality judging rule and the first quality classification model by this method, and it is to be sorted to obtain this
Type belonging to object.The embodiment of the present application can identify the type of media with automatic screening, to identify low-quality matchmaker
Body solves the technical issues of can not manually identifying a large amount of low quality media at present.
Detailed description of the invention
Fig. 1 is a kind of example architecture figure of information publishing platform in the embodiment of the present application;
Fig. 2 is the method schematic diagram that a kind of object type provided by the embodiments of the present application determines;
Fig. 3 is the examples of interfaces figure for the type certain situation that server shows object to be sorted in the embodiment of the present application;
Fig. 4 is the examples of interfaces figure of the embodiment of the present application Type Change;
Fig. 5 is the examples of interfaces figure that user passes through terminal device edit text message in the embodiment of the present application;
Fig. 6 is the examples of interfaces figure that user passes through mobile phone edit text message in the embodiment of the present application;
Fig. 7 is the example topology figure of the application examples for the method that a kind of object type provided by the embodiments of the present application determines;
Fig. 8 is the examples of interfaces figure that high-quality media candidate is shown in the embodiment of the present application;
Fig. 9 is the exemplary diagram for the device that a kind of object type provided by the embodiments of the present application determines;
Figure 10 is a kind of server architecture schematic diagram provided by the embodiments of the present application.
Specific embodiment
The embodiment of the present application provides a kind of method and relevant apparatus that object type is determining, to be sorted right for determining
The type of elephant, so that Automatic sieve selects low quality media.
The description and claims of this application and term " first ", " second ", " third ", " in above-mentioned attached drawing
The (if present)s such as four " are to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should manage
The data that solution uses in this way are interchangeable under appropriate circumstances, so as to embodiments herein described herein can in addition to
Here the sequence other than those of diagram or description is implemented.In addition, term " includes " and " corresponding to " and their any change
Shape, it is intended that cover it is non-exclusive include, for example, containing the process, method of a series of steps or units, system, product
Or equipment those of is not necessarily limited to be clearly listed step or unit, but may include be not clearly listed or for these
The intrinsic other step or units of process, method, product or equipment.
In the embodiment of the present application, " illustrative " or " such as " etc. words for indicate make example, illustration or explanation.This
Application embodiment in be described as " illustrative " or " such as " any embodiment or design scheme be not necessarily to be construed as comparing
Other embodiments or design scheme more preferably or more advantage.Specifically, use " illustrative " or " such as " etc. words purport
Related notion is being presented in specific ways.
In order to which the description of following each embodiments understands succinct, the brief introduction of the relevant technologies is provided first:
Artificial intelligence (Artificial Intelligence, AI) is to utilize digital computer or digital computer control
Machine simulation, extension and the intelligence for extending people of system, perception environment obtain knowledge and the reason using Knowledge Acquirement optimum
By, method, technology and application system.In other words, artificial intelligence is a complex art of computer science, it attempts to understand
The essence of intelligence, and produce a kind of new intelligence machine that can be made a response in such a way that human intelligence is similar.Artificial intelligence
The design principle and implementation method for namely studying various intelligence machines make machine have the function of perception, reasoning and decision.
Artificial intelligence technology is an interdisciplinary study, is related to that field is extensive, and the technology of existing hardware view also has software layer
The technology in face.Artificial intelligence basic technology generally comprise as sensor, Special artificial intelligent chip, cloud computing, distributed storage,
The technologies such as big data processing technique, operation/interactive system, electromechanical integration.Artificial intelligence software's technology mainly includes computer
Several general orientation such as vision technique, voice processing technology, natural language processing technique and machine learning/deep learning.The application
Embodiment does simple introduction to natural language processing technique therein and machine learning:
Natural language processing (Nature Language processing, NLP) is computer science and artificial intelligence
An important directions in energy field.It, which studies to be able to achieve between people and computer, carries out the various of efficient communication with natural language
Theory and method.Natural language processing is one and melts linguistics, computer science, mathematics in the science of one.Therefore, this neck
The research in domain will be related to natural language, i.e. people's language used in everyday, so it and philological research have close connection
System.Natural language processing technique generally includes the skills such as text-processing, semantic understanding, machine translation, robot question and answer, knowledge mapping
Art.
Machine learning (Machine Learning, ML) is a multi-field cross discipline, be related to probability theory, statistics,
The multiple subjects such as Approximation Theory, convextiry analysis, algorithm complexity theory.Specialize in the study that the mankind were simulated or realized to computer how
Behavior reorganizes the existing structure of knowledge and is allowed to constantly improve the performance of itself to obtain new knowledge or skills.Engineering
Habit is the core of artificial intelligence, is the fundamental way for making computer have intelligence, and application spreads the every field of artificial intelligence.
Machine learning and deep learning generally include artificial neural network, confidence network, intensified learning, transfer learning, inductive learning, formula
The technologies such as teaching habit.
With artificial intelligence technology research and progress, research and application is unfolded in multiple fields in artificial intelligence technology, such as
Common smart home, intelligent wearable device, virtual assistant, intelligent sound box, intelligent marketing, unmanned, automatic Pilot, nobody
Machine, robot, intelligent medical, intelligent customer service etc., it is believed that with the development of technology, artificial intelligence technology will obtain in more fields
To application, and play more and more important value.Especially in terms of information processing, the embodiment of the present application can be by artificial intelligence
Technical application is illustrated in terms of information processing especially by following examples:
It should be understood that the embodiment of the present application is applied to information publishing platform, user can be issued with log-on message distribution platform to be believed
Breath.Information publishing platform be it is a kind of allow user's registration account and the platform that releases news, which can carry on the server.
User can pass through the Platform communication in the client and server on terminal device.
Fig. 1 is a kind of example architecture figure of information publishing platform in the embodiment of the present application.The framework of the information publishing platform
Including terminal device and server, terminal device by its step of the client executing on terminal device or can realize its function
Can, server can execute its step by the platform on server or realize its function.In release information, terminal device can
To obtain the information of user's input first, server then is sent by the information that user inputs.In some embodiments, it services
Device can future self terminal equipment information store into database, when terminal device to server obtain information when, server
The information is read from database and obtains and is transferred to terminal device.In some embodiments, server can will come from eventually
The information of end equipment directly pushes to the terminal device of connection server.Method of the embodiment of the present application to server publishing information
It is not specifically limited.
It is understood that the user of release information can be personal user, it is referred to as from media subscriber, it can also be with
It is enterprise customer, is referred to as generic media user.For example, certain media enterprise wishes to issue in major information publishing platform
Information, then the media enterprise needs the register account number in major information publishing platform, and is released news by the account, these matchmakers
The account of body enterprises registration is properly termed as media account.In another example ordinary populace wishes that issuing it in information publishing platform writes
The information such as article, poem, the photo of shooting or the video write, then ordinary populace can also be registered in major information publishing platform
Account, and released news by the account, the account of these ordinary populaces registration is properly termed as from media account.
In the embodiment of the present application, the information of user's publication can be text information, video information, pictorial information etc., right
This is not specifically limited.Illustratively, when the information of user's publication includes text information, the embodiment of the present application can use people
The natural language processing technique of work smart field is handled, when the information of user's publication includes video information, pictorial information,
The embodiment of the present application can be handled using the computer vision technique of artificial intelligence field.
Since the information quality of user's publication is irregular, also, due to the driving of interests, user can be delivered with advertisement
Or the article manufactured in a rough and slipshod way, therefore user be classified extremely important.The information issued at present by manual examination and verification user,
This method low efficiency, and be difficult to screen a large amount of information, it is difficult to it classifies to a large number of users.
To solve the above problems, the embodiment of the present application provides a kind of method that object type determines, this method passes through matter
It measures decision rule and the first quality classification model and object to be sorted is subjected to sifting sort, obtain belonging to the object to be sorted
Type.The embodiment of the present application can identify the type of media with automatic screening, to identify low-quality media, solve at present
The technical issues of can not manually identifying a large amount of low quality media.
Fig. 2 is the method schematic diagram that a kind of object type provided by the embodiments of the present application determines, comprising:
201, the information aggregate of object to be sorted is obtained, wherein information aggregate is according to first information set and second
What information aggregate generated;
For convenience of description, the embodiment of the present application is described by server of executing subject, certainly, other executing subjects
The method that object type provided by the embodiments of the present application determines can be executed, is not specifically limited in this embodiment.
In the embodiment of the present application, object to be sorted can be media account above-mentioned or from media account, either
From the corresponding mark of media subscriber, the corresponding mark of generic media user, it can also be that the corresponding mark of user, the application are implemented
Example is not specifically limited in this embodiment.
In some embodiments, information aggregate may include the average dispatch amount of object to be sorted, paragraph number, picture number etc.
Information aggregate.Wherein, average dispatch amount can refer to that object to be sorted has averagely issued how many articles daily, or wait divide
Class object, which is averaged, how many article has been issued every month.In further embodiments, information aggregate may include object to be sorted
Article's style, for example, object to be sorted has delivered how many articles in total, wherein how many piece is low quality article.Judge one
Whether article is that the method for low quality article can be neural network algorithm, is also possible to through manual examination and verification, the application is real
It applies example not limiting this, in further embodiments, the information aggregate of object to be sorted further includes object publishing to be sorted
The paragraph number of article, the number of words of each paragraph, the information such as total number of word of article.It in practical applications, can also include other letters
Breath, is not specifically limited herein.
In the embodiment of the present application, the information aggregate of the available object to be sorted of server.Specifically, server can be with
The information aggregate of object to be sorted is generated by first information set and the second information aggregate.In some embodiments, first
Information aggregate can be historical information set, and the second information aggregate can be the information aggregate currently obtained, for example, the same day obtains
Information aggregate.
If 202, the information aggregate of object to be sorted meets quality judging rule, it is determined that object to be sorted belongs to the first kind
Type object;
In the embodiment of the present application and subsequent embodiment, first kind object can be low mass object, can also claim
For low-quality media or low-quality object.
In some embodiments, the information aggregate of object to be sorted, which meets quality judging rule, can be object hair to be sorted
In the information of cloth, the accounting of low quality text information is greater than the first accounting threshold value of setting, then illustrates that the object to be sorted is frequent
Low quality text information is issued, then the object to be sorted can be attributed to first kind object by server.Text information can be with
It is the information such as article, short commentary, news, herein without limitation.Low quality text information can be as malice advertisement, it is terrible, pornographic,
Piece together fabricate Deng articles.In the embodiment of the present application, the information for the object to be sorted that server obtains has already passed through certain algorithm
It is identified as low quality text information and non-low quality text information, specific algorithm is not specifically limited in the embodiment of the present application.It is low
Quality text information accounting can account for all texts of object publishing to be sorted for the low quality text information of object publishing to be sorted
The accounting of this information.
In further embodiments, the information aggregate of object to be sorted, which meets quality judging rule, can be object to be sorted
In the information of publication, repeated text information accounting is greater than the second accounting threshold value of setting, then illustrates that the object to be sorted is often sent out
Cloth repeated text information, then the object to be sorted can be attributed to first kind object by server.In some embodiments, weight
Multiple text information can be text weight in the series of articles that text repeats in single article or picture repeats or media are delivered
The multiple or duplicate text information of picture.In further embodiments, repeated text information can be single article title just
Repeat the multiple perhaps first sentence of paragraph in text or tail sentence repeats or the duplicate text information of paragraph.Illustratively, it services
The duplicate checking rate of text information can be calculated in device by text information duplicate checking, can be true if duplicate checking rate is more than certain threshold value
Text information is determined to repeat text information.Repeated text information accounting can be the repeated text information of object publishing to be sorted
Account for the accounting of all text informations of object publishing to be sorted.
In further embodiments, the information aggregate of object to be sorted, which meets quality judging rule, can be object to be sorted
In the information of publication, imperfect text information accounting is greater than the third accounting threshold value of setting, then illustrates that the object to be sorted is frequent
Imperfect text information is issued, then the object to be sorted can be attributed to first kind object by server.In some embodiments
In, imperfect text information, which can be, lacks five elements in text information, for example, news category text information lack the time, place,
One of personage or much information.Server obtain object to be sorted information have already passed through certain algorithm be identified as it is endless
Whole text information and non-imperfect text information, specific algorithm are not specifically limited in the embodiment of the present application.Imperfect text envelope
Breath accounting can account for accounting for for all text informations of object publishing to be sorted for the imperfect text information of object publishing to be sorted
Than.
In further embodiments, the information aggregate of object to be sorted, which meets quality judging rule, can be object to be sorted
In the information of publication, thin text information accounting is greater than the 4th accounting threshold value of setting, then illustrates that the object to be sorted is often sent out
The thin text information of cloth, then the object to be sorted can be attributed to first kind object by server.In some embodiments, single
Thin text information can be article content it is thin, without nutrition, show as arbitrarily cutting several figures plus two sections of words, be also possible to article
Body matter length is very few, information is few, can also be that article is delivered as image-text article, but text is largely video, text
It is few, there is the suspicion for evading duplicate checking, in practical applications, can also be other similar text information, herein without limitation.One
In a little embodiments, server can detecte the number of words of every section of text in these text informations, if every section of average number of words is less than one
Determine threshold value, then illustrate that the text information content is thin, determines that text information is thin text information.Thin text information accounting
The accounting of all text informations of object publishing to be sorted can be accounted for for the thin text information of object publishing to be sorted.
In further embodiments, the information aggregate of object to be sorted, which meets quality judging rule, can be object to be sorted
Title and the title of other objects to be sorted it is largely similar.For example, the tussilago in the Yalu River and Australia nightshade, the duck in the Yalu River
The grasswort flower in green river, the hydrangea in the Yalu River, the Yalu River sansevieria trifasciata spend similar, then illustrate that the object to be sorted is infused for batch
One of the title of volume, then the object to be sorted can be determined as first kind object by server.In some embodiments, it services
Device detects the title of multiple objects to be sorted, and extracts identical text in the titles of multiple objects to be sorted, then detect to
In the title of object of classification, identical text accounts for the accounting of the total text of title, if the accounting is greater than the threshold value of setting, server
The object to be sorted can be determined as first kind object.
In further embodiments, the information aggregate of object to be sorted, which meets quality judging rule, can be object to be sorted
The dispatch frequency be greater than setting frequency threshold value, then illustrate that the frequency of the object publishing information to be sorted is abnormal, it may be possible to copy
Other people information are attacked, or the text information much manufactured in a rough and slipshod way, server can determine that the object to be sorted belongs to first
Type object.In some embodiments, the dispatch frequency of object to be sorted is the frequency of object publishing text information to be sorted.Example
Such as, object to be sorted is averaged and delivers 500 text informations every month, the dispatch frequency of the object to be sorted be 500 monthly.Again
For example, object to be sorted is average to deliver 10 text informations daily, then the dispatch frequency of the object to be sorted is 10 daily.If
The dispatch frequency of the object to be sorted is more than preset frequency threshold value, then server can determine that the object to be sorted is the first kind
Type object.
In practical applications, other quality judgings rule can also be set, details are not described herein again for the embodiment of the present application.
If 203, the information aggregate of object to be sorted does not meet quality judging rule, according to the information collection of object to be sorted
Close the characteristic information for obtaining object to be sorted;
In the embodiment of the present application, if the information aggregate of object to be sorted does not meet quality judging rule, server can
To be determined by model.It specifically can be, it is to be sorted right that server is obtained according to the information aggregate of object to be sorted first
The characteristic information of elephant.
In some embodiments, the characteristic information of object to be sorted may include the essential information of object to be sorted, dispatch
Distribution situation, the text information of object publishing to be sorted, the structure feature of text information, low quality text information accounting etc..
In some embodiments, the essential information of object to be sorted can be the information such as title, grade, the type of object to be sorted.In
In some embodiments, the dispatch distribution situation of object to be sorted can send the documents channel, channel number variance, channel number cross entropy based on
Deng.In some embodiments, the text information of object publishing to be sorted can be article, paragraph, news etc..In some embodiments
In, the structure feature of the text information of object publishing to be sorted can be paragraph number, picture number, the punctuate rule of text information
Deng.In practical applications, the characteristic information of object to be sorted can also be that other situations, the embodiment of the present application do not limit this
It is fixed.
In some embodiments, server can select more from candidate characteristic information according to feature selecting algorithm
Important characteristic information.Feature selecting algorithm is not specifically limited in the embodiment of the present application.Illustratively, server can select
It is more important that the features such as long article accounting are selected into region article accounting, medium type, other serious low-quality articles, dispatch.
204, pass through the first quality classification model and obtain classification results corresponding to characteristic information, wherein classification results are
First classification results or the second classification results, the first classification results indicate that object to be sorted belongs to first kind object, second point
Class result indicates that object to be sorted belongs to Second Type object or third type object.
In the embodiment of the present application, the first quality classification model can be each class model in machine learning, for example, first
Quality classification model, which can be, promotes tree-model (xgboost model), is also possible to other models classified, and the application is real
The types of models that example does not limit use specifically is applied, could alternatively be various other effective novel model structures.The application is real
It applies example and does not also limit characteristic information quantity and type.
In some embodiments, the characteristic information of object to be sorted can be inputted the first quality classification model by server,
Classification results corresponding to characteristic information are obtained, the classification results of object to be sorted are referred to as.If the classification results are the
One classification results, then object to be sorted belongs to first kind object, if the classification results are the second classification results, this is to be sorted
Object belongs to Second Type object or third type object, that type of specific data can be carried out true by subsequent embodiment
It is fixed.
In some embodiments, Second Type object is referred to as high-quality media, and third type object is referred to as
Middle matter media (media of medium quality).
In some embodiments, the manager of information publishing platform can by terminal device logs server, check to
The type certain situation of object of classification, server can show the type certain situation of object to be sorted by terminal device.Fig. 3
The examples of interfaces figure of the type certain situation of object to be sorted is shown for server in the embodiment of the present application.As it can be seen that server exhibition
It may include title bar, function plate and main interface in the interface shown, wherein title bar is used for the title of display interface, function
Plate is used for for user's selection function, includes object to be sorted, the title of object to be sorted and object to be sorted in main interface
Type, the title bar in other figures of the application is similar with function plate, will not be described in great detail.Illustratively, object 1 to be sorted is
XX top news, server determine that its type is third type object, and object 2 to be sorted is XX news, and server determines that its type is
Second Type object, object 3 to be sorted are XX people number, and server determines that its type is first kind object.In practical application
In, it can also be shown by other objects to be sorted and the type of determination, herein without limitation.In addition, can also be opened up on interface
Show more contents, is not specifically limited herein.
In some embodiments, staff can be by the type that object to be sorted is audited at interface as shown in Figure 3
It is no correct.If the type that staff views object to be sorted is determined that staff can click boundary by server mistake
" change " virtual push button on face, so that the type for treating object of classification is modified.Fig. 4 is the embodiment of the present application Type Change
Examples of interfaces figure.As it can be seen that being popped up on interface after staff clicks " change " virtual push button on interface as shown in Figure 3
Type choice box.Illustratively, staff clicks first kind object as shown in Figure 4, then the type of object to be sorted can
To be changed to first kind object.
Optionally, on the basis of above-mentioned Fig. 2 corresponding each embodiment, object type provided by the embodiments of the present application is true
In one alternative embodiment of fixed method, server can further determine that object to be sorted belongs to second by following steps
Type object or third type object.These steps are as follows:
If classification results are the second classification results, it is corresponding right that object to be sorted is obtained by the second quality classification model
As vector;
Calculate the similarity of object to be sorted corresponding object vectors and Second Type object vectors, Second Type object to
Amount is the corresponding vector of Second Type object;
If similarity is greater than given threshold, it is determined that object to be sorted belongs to Second Type object;
If similarity is less than or equal to given threshold, it is determined that object to be sorted belongs to third type object.
In some embodiments, Second Type object is referred to as high-quality media or high-quality object.Therefore, server can
To acquire Second Type object vectors by the second quality classification model previously according to selected high-quality object.These
Two type object vectors as standard, can object vectors corresponding with object to be sorted be compared, if the two vector ratios
It is more similar, then illustrate that object to be sorted is similar compared with high-quality object, it is to a certain extent it is considered that more similar wait divide
Class object is high-quality object.
In the embodiment of the present application, it is corresponding right can to obtain object to be sorted by the second quality classification model for server
As vector, the second quality classification model can be the model in natural language processing technique.In some embodiments, the second mass
Disaggregated model can be figure vector model, such as DeepWalk algorithm model, GraphSage algorithm model etc., and the application is implemented
Example is not specifically limited in this embodiment.Server can be obtained according to the information aggregate of object to be sorted by the second quality classification model
Take the corresponding object vectors of object to be sorted.
Then, in some embodiments, server can calculate the corresponding object vectors of object to be sorted and Second Type
The similarity of object vectors specifically can be and calculate the remaining of the corresponding object vectors of object to be sorted and Second Type object vectors
String similarity.If the cosine similarity is greater than given threshold, server can determine that object to be sorted belongs to Second Type pair
As;If the cosine similarity is less than or equal to given threshold, server can determine that object to be sorted belongs to third type pair
As.
Optionally, on the basis of above-mentioned Fig. 2 corresponding each embodiment, object type provided by the embodiments of the present application is true
In one alternative embodiment of fixed method, server can obtain object to be sorted according to the information aggregate of object to be sorted
Sequence is clicked, mark and the corresponding user identifier of object to be sorted that sequence includes object to be sorted are clicked;Then according to point
Sequence is hit, the corresponding object vectors of object to be sorted are obtained by the second quality classification model.
In the embodiment of the present application, illustratively, by taking DeepWalk algorithm model as an example, DeepWalk is one kind with no prison
The figure vector algorithm that educational inspector practises, it is similar to term vector in the training process.Server is first according to the information of object to be sorted
Set obtains the click sequence of object to be sorted.In some embodiments, server can count the corresponding text of object to be sorted
When this information is clicked, the User ID of clicking operation is carried out, statistics obtains the click sequence of each object to be sorted on this basis
Column.For example, the text information of object publishing to be sorted is clicked by user 1, user 2 and user 3, then server statistics obtain
Click sequence is object identity identification number to be sorted (identity document, ID), user 1, user 2, user 3.In this Shen
The representation that in embodiment, please click sequence can be with are as follows:
Click sequence=[object ID to be sorted, User ID, User ID, User ID, User ID ...]
In some embodiments, server will click on sequence inputting DeepWalk algorithm model.Illustratively, DeepWalk
Include two steps in algorithm model:
A. random walk is executed on the node in figure generate sequence node;
B. term vector model (skip-gram model) is run, each node is learnt according to the sequence node generated in step a
Insertion.
The output of the available DeepWalk algorithm model of server, i.e., the corresponding object vectors of object to be sorted.
Optionally, on the basis of above-mentioned Fig. 2 corresponding each embodiment, object type provided by the embodiments of the present application is true
In one alternative embodiment of fixed method, before the information aggregate for obtaining object to be sorted, method further include:
Text information is obtained, text information and object to be sorted have corresponding relationship;
It is counted to obtain the first information set of object to be sorted according to text information;
History text information is obtained, history text information and object to be sorted have corresponding relationship;
The second information aggregate of object to be sorted is obtained according to history text Information Statistics.
Illustratively, if server carries out the determination of an object type daily, server can be obtained according to daily
To text information and history text information in information publishing platform object to be sorted carry out object type determination, work as clothes
After the object type that business device completes the same day determines, the text information that the same day gets can be stored in be used as in historical data base and gone through
History text information.Illustratively, server can also be that every month carries out the determination of an object type, then server can root
Object type is carried out to the object to be sorted in information publishing platform according to the text information and history text information monthly got
Determination, after server, which completes of that month object type, to be determined, the text information that this month can be got is stored in history number
According in library be used as history text information.
In the embodiment of the present application, server can first obtain the text information for having corresponding relationship with object to be sorted.
Illustratively, the available a large amount of text information sent to terminal device of server, for example, what object to be sorted 1 was issued
Article 2, article 3 of the publication of object to be sorted 3 that article 1, object to be sorted 2 are issued etc..Then, server can be by these texts
This information is counted, and the text information of object publishing to be sorted is obtained, for example, object to be sorted has currently issued 10 texts altogether
Chapter is article 1, article 4, article 5 etc. respectively.The text information of these object publishings to be sorted is referred to as first information collection
It closes.
Fig. 5 is the examples of interfaces figure that user passes through terminal device edit text message in the embodiment of the present application.As it can be seen that in master
On interface, user can be with input header and text.Also, in some embodiments, user can also be in text insert pictures
Or video, with the content of rich text information.After user has edited text information, publication can be clicked, then terminal device
In response to the clicking operation, the text information edited can be sent to server.It is complete that server then can receive the editor
Text information, text information may include title, text etc..
Fig. 6 is the examples of interfaces figure that user passes through mobile phone edit text message in the embodiment of the present application.As it can be seen that in the mobile phone
On interface, user can be with input header and text.Also, in some embodiments, user can also be in text insert pictures
Or video, with the content of rich text information.After user has edited text information, publication can be clicked, then mobile phone responds
In the clicking operation, the text information edited can be sent to server.Server then can receive the complete text of the editor
This information, text information may include title, text etc..
In the embodiment of the present application, server can be got from historical data base has corresponding close with object to be sorted
The history text information of system.Illustratively, first 3 months history text information is preserved in historical data base, then server can
To get the first 3 months history text information that there is corresponding relationship with object to be sorted from historical data base.Then, it takes
Business device can be counted to obtain object to be sorted in preceding 3 months text informations issued altogether.For example, object to be sorted is preceding 3
Article 11, article 12, article 13 etc. have been issued within a month altogether.Text information after these statistics can be used as the second information aggregate.
Server can gather the information aggregate that object to be sorted is generated with the second information aggregate according to the first information.One
In a little embodiments, server can count to obtain the text information that object to be sorted is issued in total, and then statistics obtains to be sorted
The text that the average text information quantity (being referred to as the daily dispatch frequency) issued daily of object, average every month are issued
Information content (the dispatch frequency for being referred to as every month), object publishing to be sorted text information average paragraph number, to
The mean chart the piece number of text information of object of classification publication, how many low quality text in the text information of object publishing to be sorted
Information, the average number of words of text information etc. the information of object publishing to be sorted.
Optionally, on the basis of above-mentioned Fig. 2 corresponding each embodiment, object type provided by the embodiments of the present application is true
In one alternative embodiment of fixed method, if similarity is greater than given threshold, it is determined that object to be sorted belongs to Second Type
Object includes:
If similarity is greater than given threshold, Second Type markup information is obtained;
Determine that the annotation results of object to be sorted, annotation results include the first annotation results according to Second Type markup information
With the second annotation results, the first annotation results for indicating that object to be sorted belongs to Second Type object, use by the second annotation results
Belong to third type object in expression object to be sorted.
In some embodiments, if the similarity of the object vectors of object to be sorted and Second Type object vectors is greater than and sets
Determine threshold value, server can be by the object to be sorted compared with the Second Type markup information that user inputs, if Second Type mark
Infusing information includes the object to be sorted, then available first annotation results of server, indicate that object to be sorted belongs to the second class
Type object, if Second Type markup information does not include the object to be sorted, available second annotation results of server are indicated
Object to be sorted belongs to third type object.
In some embodiments, if the similarity of the object vectors of object to be sorted and Second Type object vectors is greater than and sets
Determine threshold value, then server can show corresponding object to be sorted, and user, which can choose, wherein thinks more good
Object to be sorted is labeled.In response to the selection labeling operation of user, server is available to arrive Second Type markup information,
The Second Type markup information includes that user selects the corresponding object to be sorted of labeling operation.Then, server can be by user
The corresponding object to be sorted of selection labeling operation is determined to belong to Second Type object.
Optionally, on the basis of above-mentioned Fig. 2 corresponding each embodiment, object type provided by the embodiments of the present application is true
In one alternative embodiment of fixed method, server has determined that the type of object to be sorted (belongs to first kind object or category
In Second Type object or belong to third type object) after, the type of object to be sorted can be updated in the database.
Illustratively, it is assumed that performance always is good before object 1 to be sorted, and the information of publication is essentially quality information, because
The type of this object is Second Type object (high-quality object), and still, since some moon, which starts greatly
Amount publication low quality text information, then server determines the object 1 to be sorted according to the method for the corresponding each embodiment of such as Fig. 2
Type be first kind object (low-quality object), then server determine after, the object 1 to be sorted can be updated in the database
Type be first kind object (low-quality object).
Based on above-mentioned each embodiment, the embodiment of the present application also provides the applications for a kind of method that object type determines
, in the application examples, object to be sorted is referred to as media, and text information is referred to as article, the letter of object to be sorted
Breath set is referred to as media information, and first kind object is referred to as low-quality media, and Second Type object can also claim
For high-quality media, third type object is referred to as middle matter media.
Fig. 7 is the example topology figure of the application examples for the method that a kind of object type provided by the embodiments of the present application determines.Fig. 7
Middle Y (Yes) indicate logic judgment be it is yes, N (No) expression logic judgment is no.
In the application examples, server can collect new article information first, can be the information that statistics is newly sent the documents daily,
Such as: media newly send the documents the essential information of chapter, the low-quality type of machine recognition, people examine low-quality type.
In the application examples, server can convert the new article information being collected into new media information, and from history
History media information is got in database, integrates to obtain media information, media according to new media information and history media information
The information aggregate of object to be sorted in the type and quantity such as foregoing individual embodiments of information, such as the newest average hair of media
The information such as Wen Liang, paragraph number, picture number, details are not described herein again.
In some application examples, server can be carried out media information by low-quality rule module according to media information
Logic judgment judges whether corresponding media are low-quality media.Illustratively, low-quality media can be divided into serious low-quality class matchmaker
Body, media of manufacturing in a rough and slipshod way, malice cheating class media, when judgement media have following situations, then may determine that the media be it is low
Matter media.
1) serious low-quality class media:
Serious low-quality class media dispatch quality level, there are apparent quality problems for media dispatch, and do media presentation
Divide the type for being easiest to start with, is broadly divided into following several situations.
If a) media largely issue low-quality article, such as: malice advertisement, it is terrible, pornographic, piece together and fabricate article;
B) article is imperfect, news category news lacks five elements (time, place, personage etc., fabricate stories).
2) it manufactures in a rough and slipshod way media:
Class of manufacturing in a rough and slipshod way media refer specifically to, and production cost is simple, and most of part can be generated by machine, and manually
Method simple modifications are embodied in repetition or crawl some texts on the net, in addition simple figure, most articles are without battalion
The features such as supporting.
A) class media are repeated:
It repeats class media presentation and is divided into two kinds: the system that text repeats in single article or picture repeats or media are delivered
Text repeats in column article or picture repeats;
Single article title repeats repeatedly in the body of the email, and the first sentence of paragraph or tail sentence repeat, and paragraph repeats.
There are common features for the article of repetition class media releasing: occurring before and after media series of articles too long meaningless
Words, nonsense is more, place mat is too long (such as " small volume language ", lead etc., gather number of words) or the identical picture of media series of articles.
B) media majority article content it is thin, without nutrition, show as arbitrarily cutting several figures plus two sections of words.
C) the body matter length of media majority article is very few, information is few.
D) media majority article is delivered as image-text article, but text is largely video, and text is few, has and evades duplicate checking
Suspicion.
3) malice cheating class media:
A) similar time, batch registration similar media name;
If media name generallys use uncommon, the longer attribute of length as common prefix, the media name of suffix, more
Number is batch registration.Such as: the tussilago in the Yalu River, Australia nightshade in the Yalu River, the Yalu River grasswort flower, the Yalu River eight
Celestial being is spent, the sansevieria trifasciata in the Yalu River is spent
B) different media have identical lead, polite:
C) media are sent the documents the frequency: non-media individual media is frequently sent the documents, and odd-numbered day dispatch is excessive, has batch to send the documents or plagiarize
Suspicion;
D) there is competing product media name or media name occurs and accredited media name is not inconsistent in (text, picture) in article.
The media of the above-mentioned type can set corresponding rule and be determined, it is corresponding to be specifically referred to earlier figures 2
The description of step 202 in each embodiment, details are not described herein again.
Surely whole low-quality media detections are come out since the low-quality rule module in application examples is different, some
In application examples, server can also be determined by low-quality media model.Server can obtain the feature letter of media first
Breath, characteristic information are predominantly following several:
1) media content features:
A) essential information: referring mainly to some essential characteristics of media, such as: media name, original media grade, medium type
Deng;
B) article dispatch verticality:
Measuring a media is the dispatch distribution situation in specific vertical field, such as: main dispatch channel, channel number variance,
Channel number cross entropy etc.;
C) text structure feature:
This feature refers to the feature of media dispatch structure, such as: paragraph number, picture number, punctuate rule;
D) media low-quality number of types and accounting:
Media low-quality type includes that machine low-quality and people examine low-quality, wherein machine low-quality includes title party, story party, list
The quantity such as thin and accounting;It includes the quantity such as malice advertisement, regular price-line advertising, terrible and accounting that people, which examines low-quality,;
2) media behavioural characteristic:
A) article common feature:
The article repeat number in single article and media January such as title, first sentence, picture;
B) dispatch feature:
Total low-quality article number, total article number, single day maximum dispatch number etc. in one month;
C) cheating category feature:
It is whether similar to other media names, whether deposit and have identical paragraph etc. with other media;
3) user behavior characteristics:
Stay time, month light exposure, a click volume etc.;
4) other products behavioural characteristic:
The data such as exposure rate, the clicking rate of product.
Server can extract 100 multidimensional characteristics in total, input the feature into the disaggregated models such as xgboost later, carry out
Two classification.In some application examples, server can be can analyze by feature selecting algorithm obtains region article accounting, media
Type, other serious low-quality articles, the features such as long article accounting are more important in dispatch, thus, server can remove superfluous
Remaining feature retains important feature, improves the speed that high-quality media model executes.
In the application example, after screening server goes out low-quality media, need to carry out remaining non-low-quality media further
Screening, can be and be screened out from it high-quality media.Illustratively, server can be screened by high-quality media model.
Firstly, server can click the corresponding media of article, according to the media information counting user of non-low-quality media
The user that each media are counted on the basis of this clicks sequence, input form are as follows: media ID, User ID _ 1, User ID _ 2 ....
Then, server can use the vector that the training of DeepWalk scheduling algorithm obtains non-low-quality media, and DeepWalk is
A kind of figure vector algorithm with unsupervised learning, it is similar to term vector in the training process.
DeepWalk algorithm includes two steps:
A. random walk is executed on the node in figure generate sequence node;
B. skip-gram is run, learns the insertion of each node according to the sequence node generated in step a.
Finally, server can using the high-quality media of history in database as Seed Media, using non-low-quality media to
Amount does similarity calculation with Seed Media vector, candidate using the higher media of similarity as high-quality media.
Server can show high-quality media candidate on interface, as shown in figure 8, Fig. 8 is to open up in the embodiment of the present application
Show the examples of interfaces figure of high-quality media candidate, shows media account, title and choice box in main interface, be shown in such as Fig. 8
Media account on interface is that high-quality media are candidate.Confirm that certain several high-quality media candidate can be excellent when staff audits
Matter media can then click "Yes" virtual push button in the choice box of corresponding media account, then terminal device can be in response to
The clicking operation, the markup information of staff is sent to server, which indicates that staff is high-quality at which
"Yes" virtual push button is clicked in media candidate.
Remaining media are neither low-quality media, nor high-quality media.Then these media can be determined as by server
Middle matter media.
Finally, server can update determining medium type into database.
Xgboost algorithm is used in above application examples as sorting algorithm, actually the embodiment of the present application does not limit specifically
Surely the types of models used could alternatively be various other effective novel model structures, not limit feature quantity and type,
It can be suitable for the feature of specific business according to specific service design and suitable feature can be selected according to feature selecting algorithm.
Mining algorithm is used to not only that high-quality media excavate in above application examples, is equally applicable to the digging of the media such as low-quality
Pick, also not necessarily DeepWalk algorithm, other figure vector algorithms can be replaced same algorithm, such as: GraphSage etc..
It is to greatly reduce artificial operation media grade inefficiency to ask in place of the main innovation of the embodiment of the present application
Topic, such as: a. low-quality media: artificial efficiency of operation is low, expends the problem of huge manpower and material resources;B. high-quality media: due to excellent
Matter media negligible amounts, and the audit condition of history white list is more stringent, and algorithm provided by the invention can excavate largely
High-quality media, so that more high-quality articles be made to be exposed.
The embodiment of the present application improves the problems such as judging by accident seriously using rule and method merely and recall deficiency;
The characteristics of the embodiment of the present application EMS memory occupation is low, low latency makes that it is suitable for the exclusive application of resource occupation
Scene;
The embodiment of the present application has flexibility and versatility, supports to formulate different characteristic for specific business, therefore extensive
Ability is strong, is applicable under plurality of application scenes.
Fig. 9 shows the device 900 that a kind of object type provided by the embodiments of the present application determines, comprising:
Acquiring unit 901, for obtaining the information aggregate of object to be sorted, wherein information aggregate is according to the first information
What set and the second information aggregate generated;
Processing unit 902, if the information aggregate for object to be sorted meets quality judging rule, it is determined that be sorted right
As belonging to first kind object;
Processing unit 902, if the information aggregate for being also used to object to be sorted does not meet quality judging rule, according to wait divide
The information aggregate of class object obtains the characteristic information of object to be sorted;
Processing unit 902 is also used to obtain classification results corresponding to characteristic information by the first quality classification model,
In, classification results are the first classification results or the second classification results, and the first classification results indicate that object to be sorted belongs to the first kind
Type object, the second classification results indicate that object to be sorted belongs to Second Type object or third type object.
Optionally, on the basis of above-mentioned Fig. 9 corresponding each embodiment, object type provided by the embodiments of the present application is true
In one alternative embodiment of fixed device, processing unit 902 is also used to:
If classification results are the second classification results, it is corresponding right that object to be sorted is obtained by the second quality classification model
As vector;
Calculate the similarity of object to be sorted corresponding object vectors and Second Type object vectors, Second Type object to
Amount is the corresponding vector of Second Type object;
If similarity is greater than given threshold, it is determined that object to be sorted belongs to Second Type object;
If similarity is less than or equal to given threshold, it is determined that object to be sorted belongs to third type object.
Optionally, on the basis of above-mentioned Fig. 9 corresponding each embodiment, object type provided by the embodiments of the present application is true
In one alternative embodiment of fixed device, acquiring unit 901 is also used to:
Text information is obtained, text information and object to be sorted have corresponding relationship;
It is counted to obtain the first information set of object to be sorted according to text information;
History text information is obtained, history text information and object to be sorted have corresponding relationship;
The second information aggregate of object to be sorted is obtained according to history text Information Statistics.
Optionally, on the basis of above-mentioned Fig. 9 corresponding each embodiment, object type provided by the embodiments of the present application is true
In one alternative embodiment of fixed device, processing unit 902 is also used to:
If low quality text information accounting is greater than the first accounting threshold value of setting in the information aggregate of object to be sorted, really
Fixed object to be sorted belongs to first kind object, and low quality text information accounting is the corresponding low quality text envelope of object to be sorted
Breath accounts for the accounting of the corresponding text information of object to be sorted.
Optionally, on the basis of above-mentioned Fig. 9 corresponding each embodiment, object type provided by the embodiments of the present application is true
In one alternative embodiment of fixed device, processing unit 902 is also used to:
If repeated text information accounting is greater than the second accounting threshold value of setting in the information aggregate of object to be sorted, it is determined that
Object to be sorted belongs to first kind object, and repeated text information accounting is the corresponding higher text of duplicate checking rate of object to be sorted
Information accounts for the accounting of the corresponding text information of object to be sorted.
Optionally, on the basis of above-mentioned Fig. 9 corresponding each embodiment, object type provided by the embodiments of the present application is true
In one alternative embodiment of fixed device, processing unit 902 is also used to:
If the dispatch frequency of object to be sorted is greater than the frequency threshold value of setting in the information aggregate of object to be sorted, it is determined that
Object to be sorted belongs to first kind object, and the dispatch frequency of object to be sorted is the frequency of object publishing text information to be sorted
It is secondary.
Optionally, on the basis of above-mentioned Fig. 9 corresponding each embodiment, object type provided by the embodiments of the present application is true
In one alternative embodiment of fixed device, processing unit 902 is also used to:
The click sequence of object to be sorted is obtained according to the information aggregate of object to be sorted, it includes to be sorted right for clicking sequence
The mark of elephant and the corresponding user identifier of object to be sorted;
According to sequence is clicked, the corresponding object vectors of object to be sorted are obtained by the second quality classification model.
Optionally, on the basis of above-mentioned Fig. 9 corresponding each embodiment, object type provided by the embodiments of the present application is true
In one alternative embodiment of fixed device, processing unit 902 is also used to:
If similarity is greater than given threshold, Second Type markup information is obtained;
Determine that the annotation results of object to be sorted, annotation results include the first annotation results according to Second Type markup information
With the second annotation results, the first annotation results for indicating that object to be sorted belongs to Second Type object, use by the second annotation results
Belong to third type object in expression object to be sorted.
Figure 10 is a kind of server architecture schematic diagram provided by the embodiments of the present application, which can be because of configuration or property
Energy is different and generates bigger difference, may include one or more central processing units (central processing
Units, CPU) 1022 (for example, one or more processors) and memory 1032, one or more storage applications
The storage medium 1030 (such as one or more mass memory units) of program 1042 or data 1044.Wherein, memory
1032 and storage medium 1030 can be of short duration storage or persistent storage.The program for being stored in storage medium 1030 may include one
A or more than one module (diagram does not mark), each module may include to the series of instructions operation in server.More into
One step, central processing unit 1022 can be set to communicate with storage medium 1030, execute storage medium on server 1000
Series of instructions operation in 1030.
Server 1000 can also include one or more power supplys 1026, one or more wired or wireless nets
Network interface 1050, one or more input/output interfaces 1058, and/or, one or more operating systems 1041, example
Such as Windows ServerTM, Mac OS XTM, UnixTM,LinuxTM, FreeBSDTMEtc..
The step as performed by server can be based on the server architecture shown in Fig. 10 in above-described embodiment.
In the embodiment of the present application, CPU1022 is specifically used for:
Obtain the information aggregate of object to be sorted, wherein information aggregate is according to first information set and the second information
What set generated;
If the information aggregate of object to be sorted meets quality judging rule, it is determined that object to be sorted belongs to the first kind pair
As;
If the information aggregate of object to be sorted does not meet quality judging rule, obtained according to the information aggregate of object to be sorted
Take the characteristic information of object to be sorted;
Classification results corresponding to characteristic information are obtained by the first quality classification model, wherein classification results first
Classification results or the second classification results, the first classification results indicate that object to be sorted belongs to first kind object, the second classification knot
Fruit indicates that object to be sorted belongs to Second Type object or third type object.
In the embodiment of the present application, CPU1022 is also used to:
If classification results are the second classification results, it is corresponding right that object to be sorted is obtained by the second quality classification model
As vector;
Calculate the similarity of object to be sorted corresponding object vectors and Second Type object vectors, Second Type object to
Amount is the corresponding vector of Second Type object;
If similarity is greater than given threshold, it is determined that object to be sorted belongs to Second Type object;
If similarity is less than or equal to given threshold, it is determined that object to be sorted belongs to third type object.
In the embodiment of the present application, CPU1022 is also used to:
Text information is obtained, text information and object to be sorted have corresponding relationship;
It is counted to obtain the first information set of object to be sorted according to text information;
History text information is obtained, history text information and object to be sorted have corresponding relationship;
The second information aggregate of object to be sorted is obtained according to history text Information Statistics.
In the embodiment of the present application, CPU1022 is also used to:
If low quality text information accounting is greater than the first accounting threshold value of setting in the information aggregate of object to be sorted, really
Fixed object to be sorted belongs to first kind object, and low quality text information accounting is the corresponding low quality text envelope of object to be sorted
Breath accounts for the accounting of the corresponding text information of object to be sorted.
In the embodiment of the present application, CPU1022 is also used to:
If repeated text information accounting is greater than the second accounting threshold value of setting in the information aggregate of object to be sorted, it is determined that
Object to be sorted belongs to first kind object, and repeated text information accounting is the corresponding higher text of duplicate checking rate of object to be sorted
Information accounts for the accounting of the corresponding text information of object to be sorted.
In the embodiment of the present application, CPU1022 is also used to:
If the dispatch frequency of object to be sorted is greater than the frequency threshold value of setting in the information aggregate of object to be sorted, it is determined that
Object to be sorted belongs to first kind object, and the dispatch frequency of object to be sorted is the frequency of object publishing text information to be sorted
It is secondary.
In the embodiment of the present application, CPU1022 is also used to:
The click sequence of object to be sorted is obtained according to the information aggregate of object to be sorted, it includes to be sorted right for clicking sequence
The mark of elephant and the corresponding user identifier of object to be sorted;
According to sequence is clicked, the corresponding object vectors of object to be sorted are obtained by the second quality classification model.
In the embodiment of the present application, CPU1022 is also used to:
If similarity is greater than given threshold, Second Type markup information is obtained;
Determine that the annotation results of object to be sorted, annotation results include the first annotation results according to Second Type markup information
With the second annotation results, the first annotation results for indicating that object to be sorted belongs to Second Type object, use by the second annotation results
Belong to third type object in expression object to be sorted.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description,
The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In several embodiments provided herein, it should be understood that disclosed system, device and method can be with
It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit
It divides, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components
It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or
The mutual coupling, direct-coupling or communication connection discussed can be through some interfaces, the indirect coupling of device or unit
It closes or communicates to connect, can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product
When, it can store in a computer readable storage medium.Based on this understanding, the technical solution of the application is substantially
The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words
It embodies, which is stored in a storage medium, including some instructions are used so that a computer
Equipment (can be personal computer, server or the network equipment etc.) executes the complete of each embodiment the method for the application
Portion or part steps.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only
Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. are various can store journey
The medium of sequence code.
Claims (10)
1. a kind of method that object type determines characterized by comprising
Obtain the information aggregate of object to be sorted, wherein the information aggregate is according to first information set and the second information
What set generated;
If the information aggregate of the object to be sorted meets quality judging rule, it is determined that the object to be sorted belongs to the first kind
Type object;
If the information aggregate of the object to be sorted does not meet the quality judging rule, according to the letter of the object to be sorted
Breath set obtains the characteristic information of the object to be sorted;
Classification results corresponding to the characteristic information are obtained by the first quality classification model, wherein the classification knot
Fruit is the first classification results or the second classification results, and first classification results indicate that the object to be sorted belongs to described first
Type object, second classification results indicate that the object to be sorted belongs to Second Type object or third type object.
2. the method according to claim 1, wherein described by described in the first quality classification model acquisition
After classification results corresponding to characteristic information, the method also includes:
If the classification results are second classification results, the object to be sorted is obtained by the second quality classification model
Corresponding object vectors;
Calculate the similarity of the object to be sorted corresponding object vectors and Second Type object vectors, the Second Type pair
As vector is the corresponding vector of the Second Type object;
If the similarity is greater than given threshold, it is determined that the object to be sorted belongs to the Second Type object;
If the similarity is less than or equal to the given threshold, it is determined that the object to be sorted belongs to the third type pair
As.
3. the method according to claim 1, wherein before the information aggregate for obtaining object to be sorted, institute
State method further include:
Text information is obtained, the text information and the object to be sorted have corresponding relationship;
It is counted to obtain the first information set of the object to be sorted according to the text information;
History text information is obtained, the history text information and the object to be sorted have corresponding relationship;
The second information aggregate of the object to be sorted is obtained according to the history text Information Statistics.
4. if the method according to claim 1, wherein the information aggregate of the object to be sorted meets matter
Measure decision rule, it is determined that the object to be sorted belongs to first kind object and includes:
If low quality text information accounting is greater than the first accounting threshold value of setting in the information aggregate of the object to be sorted, really
The fixed object to be sorted belongs to first kind object, and the low quality text information accounting is that the object to be sorted is corresponding
Low quality text information accounts for the accounting of the corresponding text information of the object to be sorted.
5. if the method according to claim 1, wherein the information aggregate of the object to be sorted meets matter
Measure decision rule, it is determined that the object to be sorted belongs to first kind object and includes:
If repeated text information accounting is greater than the second accounting threshold value of setting in the information aggregate of the object to be sorted, it is determined that
The object to be sorted belongs to first kind object, and the repeated text information accounting is the corresponding duplicate checking of the object to be sorted
The higher text information of rate accounts for the accounting of the corresponding text information of the object to be sorted.
6. if the method according to claim 1, wherein the information aggregate of the object to be sorted meets matter
Measure decision rule, it is determined that the object to be sorted belongs to first kind object and includes:
If the dispatch frequency of object to be sorted described in the information aggregate of the object to be sorted is greater than the frequency threshold value of setting,
Determine that the object to be sorted belongs to first kind object, the dispatch frequency of the object to be sorted is the object hair to be sorted
The frequency of cloth text information.
7. according to the method described in claim 2, it is characterized in that, described pass through described in the acquisition of the second quality classification model wait divide
The corresponding object vectors of class object include:
The click sequence of the object to be sorted is obtained according to the information aggregate of the object to be sorted, the click sequence includes
The mark of the object to be sorted and the corresponding user identifier of the object to be sorted;
According to the click sequence, the corresponding object vectors of the object to be sorted are obtained by the second quality classification model.
8. if according to the method described in claim 2, it is characterized in that, the similarity is greater than given threshold, it is determined that
The object to be sorted belongs to the Second Type object
If the similarity is greater than given threshold, Second Type markup information is obtained;
Determine that the annotation results of the object to be sorted, the annotation results include first according to the Second Type markup information
Annotation results and the second annotation results, first annotation results are for indicating that the object to be sorted belongs to the Second Type
Object, second annotation results are for indicating that the object to be sorted belongs to the third type object.
9. the device that a kind of object type determines characterized by comprising
Acquiring unit, for obtaining the information aggregate of object to be sorted, wherein the information aggregate is according to first information set
And second information aggregate generate;
Processing unit, if the information aggregate for the object to be sorted meets quality judging rule, it is determined that described to be sorted
Object belongs to first kind object;
Processing unit, if the information aggregate for being also used to the object to be sorted does not meet the quality judging rule, according to institute
The information aggregate for stating object to be sorted obtains the characteristic information of the object to be sorted;
Processing unit is also used to obtain classification results corresponding to the characteristic information by the first quality classification model,
Wherein, the classification results are the first classification results or the second classification results, and first classification results indicate described to be sorted
Object belongs to the first kind object, second classification results indicate the object to be sorted belong to Second Type object or
Third type object.
10. a kind of server characterized by comprising
One or more central processing units, memory, input/output interface, wired or wireless network interface, power supply;
The memory is of short duration storage memory or persistent storage memory;
The central processing unit is configured to communicate with the memory, and the instruction in the memory is executed on the server
Operation is in method described in any one of perform claim requirement 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910841009.8A CN110532331A (en) | 2019-09-05 | 2019-09-05 | A kind of method and relevant apparatus that object type is determining |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910841009.8A CN110532331A (en) | 2019-09-05 | 2019-09-05 | A kind of method and relevant apparatus that object type is determining |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110532331A true CN110532331A (en) | 2019-12-03 |
Family
ID=68667384
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910841009.8A Pending CN110532331A (en) | 2019-09-05 | 2019-09-05 | A kind of method and relevant apparatus that object type is determining |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110532331A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111458471A (en) * | 2019-12-19 | 2020-07-28 | 中国科学院合肥物质科学研究院 | Water area detection early warning method based on graph neural network |
CN112287037A (en) * | 2020-10-23 | 2021-01-29 | 大连东软教育科技集团有限公司 | Multi-entity mixed knowledge graph construction method and device and storage medium |
CN113207013A (en) * | 2020-02-03 | 2021-08-03 | 腾讯科技(深圳)有限公司 | Multimedia data release management method, device, equipment and storage medium |
WO2023115890A1 (en) * | 2021-12-22 | 2023-06-29 | 郑州云海信息技术有限公司 | Text quality cleaning method and apparatus, and medium |
CN112287037B (en) * | 2020-10-23 | 2024-05-31 | 东软教育科技集团有限公司 | Multi-entity mixed knowledge graph construction method, device and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104991896A (en) * | 2015-05-26 | 2015-10-21 | 百度在线网络技术(北京)有限公司 | Method and apparatus for analyzing two-dimension codes |
CN107391545A (en) * | 2017-05-25 | 2017-11-24 | 阿里巴巴集团控股有限公司 | A kind of method classified to user, input method and device |
CN109165839A (en) * | 2018-08-17 | 2019-01-08 | 龙马智芯(珠海横琴)科技有限公司 | The processing method and processing device of data |
CN109559246A (en) * | 2018-10-31 | 2019-04-02 | 北京春雨天下软件有限公司 | Enter group checking method, audit server, client and enters group auditing system |
US20190243919A1 (en) * | 2018-02-06 | 2019-08-08 | Microsoft Technology Licensing, Llc | Multilevel representation learning for computer content quality |
CN110120912A (en) * | 2019-05-10 | 2019-08-13 | 腾讯科技(深圳)有限公司 | Rich-media content processing method, device, readable storage medium storing program for executing and computer equipment |
-
2019
- 2019-09-05 CN CN201910841009.8A patent/CN110532331A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104991896A (en) * | 2015-05-26 | 2015-10-21 | 百度在线网络技术(北京)有限公司 | Method and apparatus for analyzing two-dimension codes |
CN107391545A (en) * | 2017-05-25 | 2017-11-24 | 阿里巴巴集团控股有限公司 | A kind of method classified to user, input method and device |
US20190243919A1 (en) * | 2018-02-06 | 2019-08-08 | Microsoft Technology Licensing, Llc | Multilevel representation learning for computer content quality |
CN109165839A (en) * | 2018-08-17 | 2019-01-08 | 龙马智芯(珠海横琴)科技有限公司 | The processing method and processing device of data |
CN109559246A (en) * | 2018-10-31 | 2019-04-02 | 北京春雨天下软件有限公司 | Enter group checking method, audit server, client and enters group auditing system |
CN110120912A (en) * | 2019-05-10 | 2019-08-13 | 腾讯科技(深圳)有限公司 | Rich-media content processing method, device, readable storage medium storing program for executing and computer equipment |
Non-Patent Citations (5)
Title |
---|
匿名: ""百家号如何提高文章内容质量?"", 《HTTPS://JINGYAN.BAIDU.COM/ARTICLE/AFD8F4DEB846F134E286E93D.HTML》 * |
袁津生等: "《21世纪高等学校精品教材 搜索引擎与信息检索教程》", 30 April 2008 * |
金燕等: "基于用户信誉评级的UGC质量预判模型", 《情报理论与实践》 * |
锦铷说自媒体: ""内容质量分低怎么办?看完你就明白了"", 《HTTPS://WWW.SOHU.COM/A/338381133_120257639》 * |
锦铷说自媒体: ""百家号内容质量分低怎么办?看完你就明白了"", 《HTTPS://BAIJIAHAO.BAIDU.COM/S?ID=1643530606964698882&WFR=SPIDER&FOR=PC》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111458471A (en) * | 2019-12-19 | 2020-07-28 | 中国科学院合肥物质科学研究院 | Water area detection early warning method based on graph neural network |
CN113207013A (en) * | 2020-02-03 | 2021-08-03 | 腾讯科技(深圳)有限公司 | Multimedia data release management method, device, equipment and storage medium |
CN113207013B (en) * | 2020-02-03 | 2023-11-17 | 腾讯科技(深圳)有限公司 | Multimedia data release management method, device, equipment and storage medium |
CN112287037A (en) * | 2020-10-23 | 2021-01-29 | 大连东软教育科技集团有限公司 | Multi-entity mixed knowledge graph construction method and device and storage medium |
CN112287037B (en) * | 2020-10-23 | 2024-05-31 | 东软教育科技集团有限公司 | Multi-entity mixed knowledge graph construction method, device and storage medium |
WO2023115890A1 (en) * | 2021-12-22 | 2023-06-29 | 郑州云海信息技术有限公司 | Text quality cleaning method and apparatus, and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106372072B (en) | A kind of recognition methods of location-based mobile agency meeting network user's relationship | |
CN106940732A (en) | A kind of doubtful waterborne troops towards microblogging finds method | |
CN104182517B (en) | The method and device of data processing | |
CN112199608B (en) | Social media rumor detection method based on network information propagation graph modeling | |
CN105760439B (en) | A kind of personage's cooccurrence relation map construction method based on specific behavior co-occurrence network | |
CN110377804A (en) | Method for pushing, device, system and the storage medium of training course data | |
CN110532331A (en) | A kind of method and relevant apparatus that object type is determining | |
CN107832724A (en) | The method and device of personage's key frame is extracted from video file | |
CN106354818B (en) | Social media-based dynamic user attribute extraction method | |
CN106156372B (en) | A kind of classification method and device of internet site | |
CN109299271A (en) | Training sample generation, text data, public sentiment event category method and relevant device | |
CN112765480B (en) | Information pushing method and device and computer readable storage medium | |
CN108932451A (en) | Audio-video frequency content analysis method and device | |
CN107545038B (en) | Text classification method and equipment | |
CN102663001A (en) | Automatic blog writer interest and character identifying method based on support vector machine | |
CN111159763B (en) | System and method for analyzing portrait of law-related personnel group | |
CN106951471A (en) | A kind of construction method of the label prediction of the development trend model based on SVM | |
CN106682236A (en) | Machine learning based patent data processing method and processing system adopting same | |
CN112001739A (en) | Method and system for generating user learning portrait | |
CN112115712B (en) | Topic-based group emotion analysis method | |
CN106777040A (en) | A kind of across media microblogging the analysis of public opinion methods based on feeling polarities perception algorithm | |
CN101556582A (en) | System for analyzing and predicting netizen interest in forum | |
Wei et al. | Analysis of information dissemination based on emotional and the evolution life cycle of public opinion | |
CN114048294B (en) | Similar population extension model training method, similar population extension method and device | |
CN110807060A (en) | Education big data analysis system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191203 |
|
RJ01 | Rejection of invention patent application after publication |