CN110532331A - A kind of method and relevant apparatus that object type is determining - Google Patents

A kind of method and relevant apparatus that object type is determining Download PDF

Info

Publication number
CN110532331A
CN110532331A CN201910841009.8A CN201910841009A CN110532331A CN 110532331 A CN110532331 A CN 110532331A CN 201910841009 A CN201910841009 A CN 201910841009A CN 110532331 A CN110532331 A CN 110532331A
Authority
CN
China
Prior art keywords
sorted
information
type
quality
text information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910841009.8A
Other languages
Chinese (zh)
Inventor
郑洁琼
曹霖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201910841009.8A priority Critical patent/CN110532331A/en
Publication of CN110532331A publication Critical patent/CN110532331A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the present application provides a kind of method and relevant apparatus that object type is determining, it is related to artificial intelligence field, object to be sorted is carried out sifting sort by quality judging rule and the first quality classification model by this method, obtains type belonging to the object to be sorted.The embodiment of the present application can identify the type of media with automatic screening, to identify low-quality media, solve the technical issues of can not manually identifying a large amount of low quality media at present.

Description

A kind of method and relevant apparatus that object type is determining
Technical field
A kind of method determined this application involves field of artificial intelligence more particularly to object type and related dress It sets.
Background technique
With the development of modern science and technology, the mode of media releasing information is more and more convenient.These media can be flat in network Register account number on platform is then based on account release information, such as text information, audio-frequency information and video information etc..These matchmakers Body further includes referring to the fact that ordinary populace issues they itself by the approach such as network outward and news from media from media Circulation way.The air port of content creation in recent years, major Internet company all actively enters contents marketplace, it is various from media such as Gush out as emerging rapidly in large numbersBamboo shoots after a spring rain, everybody can by write make oneself from media.Substantial amounts can all be created daily from media Make the article of magnanimity, but quality is irregular;Due to the driving of interests, can deliver from media with advertisement or crude indiscriminate The article made, thus it is extremely important to be classified from media.
Existing more single from media evaluation index, majority is to run white list media by manual examination and verification or pass through User's report closes violation media.
Currently, much will appear a large amount of low-quality articles, image and video from media, to being sieved in this way from media Choosing generally requires a large amount of manpower and material resources, and human resources are limited, it is difficult to screen so more low quality from media, it is therefore desirable to one The method of kind automatic screening.
Summary of the invention
The embodiment of the present application provides a kind of method and relevant apparatus that object type is determining, to be sorted right for determining The type of elephant, so that Automatic sieve selects low quality media.
In a first aspect, the embodiment of the present application provides a kind of method that object type determines, comprising:
Obtain the information aggregate of object to be sorted, wherein the information aggregate is according to first information set and second What information aggregate generated;
If the information aggregate of the object to be sorted meets quality judging rule, it is determined that the object to be sorted belongs to the One type object;
If the information aggregate of the object to be sorted does not meet the quality judging rule, according to the object to be sorted Information aggregate obtain the characteristic information of the object to be sorted;
Classification results corresponding to the characteristic information are obtained by the first quality classification model, wherein described point Class result is the first classification results or the second classification results, and it is described that first classification results indicate that the object to be sorted belongs to First kind object, second classification results indicate that the object to be sorted belongs to Second Type object or third type pair As.
Second aspect, the embodiment of the present application provide a kind of device that object type determines, comprising:
Acquiring unit, for obtaining the information aggregate of object to be sorted, wherein the information aggregate is according to the first information What set and the second information aggregate generated;
Processing unit, if the information aggregate for the object to be sorted meet quality judging rule, it is determined that it is described to Object of classification belongs to first kind object;
Processing unit, if the information aggregate for being also used to the object to be sorted does not meet the quality judging rule, root The characteristic information of the object to be sorted is obtained according to the information aggregate of the object to be sorted;
Processing unit is also used to obtain classification knot corresponding to the characteristic information by the first quality classification model Fruit, wherein the classification results are the first classification results or the second classification results, and first classification results indicate described wait divide Class object belongs to the first kind object, and second classification results indicate that the object to be sorted belongs to Second Type object Or third type object.
In a kind of implementation of the embodiment of the present application second aspect, the processing unit is also used to:
If the classification results are second classification results, obtained by the second quality classification model described to be sorted The corresponding object vectors of object;
Calculate the similarity of the object to be sorted corresponding object vectors and Second Type object vectors, second class Type object vectors are the corresponding vector of the Second Type object;
If the similarity is greater than given threshold, it is determined that the object to be sorted belongs to the Second Type object;
If the similarity is less than or equal to the given threshold, it is determined that the object to be sorted belongs to the third class Type object.
In a kind of implementation of the embodiment of the present application second aspect, the acquiring unit is also used to:
Text information is obtained, the text information and the object to be sorted have corresponding relationship;
It is counted to obtain the first information set of the object to be sorted according to the text information;
History text information is obtained, the history text information and the object to be sorted have corresponding relationship;
The second information aggregate of the object to be sorted is obtained according to the history text Information Statistics.
In a kind of implementation of the embodiment of the present application second aspect, the processing unit is also used to:
If low quality text information accounting is greater than the first accounting threshold value of setting in the information aggregate of the object to be sorted, Then determine that the object to be sorted belongs to first kind object, the low quality text information accounting is the object pair to be sorted The low quality text information answered accounts for the accounting of the corresponding text information of the object to be sorted.
In a kind of implementation of the embodiment of the present application second aspect, the processing unit is also used to:
If repeated text information accounting is greater than the second accounting threshold value of setting in the information aggregate of the object to be sorted, Determine that the object to be sorted belongs to first kind object, the repeated text information accounting is that the object to be sorted is corresponding The higher text information of duplicate checking rate accounts for the accounting of the corresponding text information of the object to be sorted.
In a kind of implementation of the embodiment of the present application second aspect, the processing unit is also used to:
If the dispatch frequency of object to be sorted described in the information aggregate of the object to be sorted is greater than the frequency threshold of setting Value, it is determined that the object to be sorted belongs to first kind object, and the dispatch frequency of the object to be sorted is described to be sorted The frequency of object publishing text information.
In a kind of implementation of the embodiment of the present application second aspect, the processing unit is also used to:
The click sequence of the object to be sorted, the click sequence are obtained according to the information aggregate of the object to be sorted Mark and the corresponding user identifier of the object to be sorted including the object to be sorted;
According to the click sequence, by the second quality classification model obtain the corresponding object of the object to be sorted to Amount.
In a kind of implementation of the embodiment of the present application second aspect, the processing unit is also used to:
If the similarity is greater than given threshold, Second Type markup information is obtained;
Determine that the annotation results of the object to be sorted, the annotation results include according to the Second Type markup information First annotation results and the second annotation results, first annotation results are for indicating that the object to be sorted belongs to described second Type object, second annotation results are for indicating that the object to be sorted belongs to the third type object.
The third aspect, the embodiment of the present application provide server, comprising:
One or more central processing units, memory, input/output interface, wired or wireless network interface, power supply;
The memory is of short duration storage memory or persistent storage memory;
The central processing unit is configured to communicate with the memory, is executed in the memory on the server Instruction operation is to execute the method such as first aspect.
As can be seen from the above technical solutions, the embodiment of the present application has the advantage that
The embodiment of the present application provides a kind of method and relevant apparatus that object type is determining, is related to artificial intelligence field, Object to be sorted is carried out sifting sort by quality judging rule and the first quality classification model by this method, and it is to be sorted to obtain this Type belonging to object.The embodiment of the present application can identify the type of media with automatic screening, to identify low-quality matchmaker Body solves the technical issues of can not manually identifying a large amount of low quality media at present.
Detailed description of the invention
Fig. 1 is a kind of example architecture figure of information publishing platform in the embodiment of the present application;
Fig. 2 is the method schematic diagram that a kind of object type provided by the embodiments of the present application determines;
Fig. 3 is the examples of interfaces figure for the type certain situation that server shows object to be sorted in the embodiment of the present application;
Fig. 4 is the examples of interfaces figure of the embodiment of the present application Type Change;
Fig. 5 is the examples of interfaces figure that user passes through terminal device edit text message in the embodiment of the present application;
Fig. 6 is the examples of interfaces figure that user passes through mobile phone edit text message in the embodiment of the present application;
Fig. 7 is the example topology figure of the application examples for the method that a kind of object type provided by the embodiments of the present application determines;
Fig. 8 is the examples of interfaces figure that high-quality media candidate is shown in the embodiment of the present application;
Fig. 9 is the exemplary diagram for the device that a kind of object type provided by the embodiments of the present application determines;
Figure 10 is a kind of server architecture schematic diagram provided by the embodiments of the present application.
Specific embodiment
The embodiment of the present application provides a kind of method and relevant apparatus that object type is determining, to be sorted right for determining The type of elephant, so that Automatic sieve selects low quality media.
The description and claims of this application and term " first ", " second ", " third ", " in above-mentioned attached drawing The (if present)s such as four " are to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should manage The data that solution uses in this way are interchangeable under appropriate circumstances, so as to embodiments herein described herein can in addition to Here the sequence other than those of diagram or description is implemented.In addition, term " includes " and " corresponding to " and their any change Shape, it is intended that cover it is non-exclusive include, for example, containing the process, method of a series of steps or units, system, product Or equipment those of is not necessarily limited to be clearly listed step or unit, but may include be not clearly listed or for these The intrinsic other step or units of process, method, product or equipment.
In the embodiment of the present application, " illustrative " or " such as " etc. words for indicate make example, illustration or explanation.This Application embodiment in be described as " illustrative " or " such as " any embodiment or design scheme be not necessarily to be construed as comparing Other embodiments or design scheme more preferably or more advantage.Specifically, use " illustrative " or " such as " etc. words purport Related notion is being presented in specific ways.
In order to which the description of following each embodiments understands succinct, the brief introduction of the relevant technologies is provided first:
Artificial intelligence (Artificial Intelligence, AI) is to utilize digital computer or digital computer control Machine simulation, extension and the intelligence for extending people of system, perception environment obtain knowledge and the reason using Knowledge Acquirement optimum By, method, technology and application system.In other words, artificial intelligence is a complex art of computer science, it attempts to understand The essence of intelligence, and produce a kind of new intelligence machine that can be made a response in such a way that human intelligence is similar.Artificial intelligence The design principle and implementation method for namely studying various intelligence machines make machine have the function of perception, reasoning and decision.
Artificial intelligence technology is an interdisciplinary study, is related to that field is extensive, and the technology of existing hardware view also has software layer The technology in face.Artificial intelligence basic technology generally comprise as sensor, Special artificial intelligent chip, cloud computing, distributed storage, The technologies such as big data processing technique, operation/interactive system, electromechanical integration.Artificial intelligence software's technology mainly includes computer Several general orientation such as vision technique, voice processing technology, natural language processing technique and machine learning/deep learning.The application Embodiment does simple introduction to natural language processing technique therein and machine learning:
Natural language processing (Nature Language processing, NLP) is computer science and artificial intelligence An important directions in energy field.It, which studies to be able to achieve between people and computer, carries out the various of efficient communication with natural language Theory and method.Natural language processing is one and melts linguistics, computer science, mathematics in the science of one.Therefore, this neck The research in domain will be related to natural language, i.e. people's language used in everyday, so it and philological research have close connection System.Natural language processing technique generally includes the skills such as text-processing, semantic understanding, machine translation, robot question and answer, knowledge mapping Art.
Machine learning (Machine Learning, ML) is a multi-field cross discipline, be related to probability theory, statistics, The multiple subjects such as Approximation Theory, convextiry analysis, algorithm complexity theory.Specialize in the study that the mankind were simulated or realized to computer how Behavior reorganizes the existing structure of knowledge and is allowed to constantly improve the performance of itself to obtain new knowledge or skills.Engineering Habit is the core of artificial intelligence, is the fundamental way for making computer have intelligence, and application spreads the every field of artificial intelligence. Machine learning and deep learning generally include artificial neural network, confidence network, intensified learning, transfer learning, inductive learning, formula The technologies such as teaching habit.
With artificial intelligence technology research and progress, research and application is unfolded in multiple fields in artificial intelligence technology, such as Common smart home, intelligent wearable device, virtual assistant, intelligent sound box, intelligent marketing, unmanned, automatic Pilot, nobody Machine, robot, intelligent medical, intelligent customer service etc., it is believed that with the development of technology, artificial intelligence technology will obtain in more fields To application, and play more and more important value.Especially in terms of information processing, the embodiment of the present application can be by artificial intelligence Technical application is illustrated in terms of information processing especially by following examples:
It should be understood that the embodiment of the present application is applied to information publishing platform, user can be issued with log-on message distribution platform to be believed Breath.Information publishing platform be it is a kind of allow user's registration account and the platform that releases news, which can carry on the server. User can pass through the Platform communication in the client and server on terminal device.
Fig. 1 is a kind of example architecture figure of information publishing platform in the embodiment of the present application.The framework of the information publishing platform Including terminal device and server, terminal device by its step of the client executing on terminal device or can realize its function Can, server can execute its step by the platform on server or realize its function.In release information, terminal device can To obtain the information of user's input first, server then is sent by the information that user inputs.In some embodiments, it services Device can future self terminal equipment information store into database, when terminal device to server obtain information when, server The information is read from database and obtains and is transferred to terminal device.In some embodiments, server can will come from eventually The information of end equipment directly pushes to the terminal device of connection server.Method of the embodiment of the present application to server publishing information It is not specifically limited.
It is understood that the user of release information can be personal user, it is referred to as from media subscriber, it can also be with It is enterprise customer, is referred to as generic media user.For example, certain media enterprise wishes to issue in major information publishing platform Information, then the media enterprise needs the register account number in major information publishing platform, and is released news by the account, these matchmakers The account of body enterprises registration is properly termed as media account.In another example ordinary populace wishes that issuing it in information publishing platform writes The information such as article, poem, the photo of shooting or the video write, then ordinary populace can also be registered in major information publishing platform Account, and released news by the account, the account of these ordinary populaces registration is properly termed as from media account.
In the embodiment of the present application, the information of user's publication can be text information, video information, pictorial information etc., right This is not specifically limited.Illustratively, when the information of user's publication includes text information, the embodiment of the present application can use people The natural language processing technique of work smart field is handled, when the information of user's publication includes video information, pictorial information, The embodiment of the present application can be handled using the computer vision technique of artificial intelligence field.
Since the information quality of user's publication is irregular, also, due to the driving of interests, user can be delivered with advertisement Or the article manufactured in a rough and slipshod way, therefore user be classified extremely important.The information issued at present by manual examination and verification user, This method low efficiency, and be difficult to screen a large amount of information, it is difficult to it classifies to a large number of users.
To solve the above problems, the embodiment of the present application provides a kind of method that object type determines, this method passes through matter It measures decision rule and the first quality classification model and object to be sorted is subjected to sifting sort, obtain belonging to the object to be sorted Type.The embodiment of the present application can identify the type of media with automatic screening, to identify low-quality media, solve at present The technical issues of can not manually identifying a large amount of low quality media.
Fig. 2 is the method schematic diagram that a kind of object type provided by the embodiments of the present application determines, comprising:
201, the information aggregate of object to be sorted is obtained, wherein information aggregate is according to first information set and second What information aggregate generated;
For convenience of description, the embodiment of the present application is described by server of executing subject, certainly, other executing subjects The method that object type provided by the embodiments of the present application determines can be executed, is not specifically limited in this embodiment.
In the embodiment of the present application, object to be sorted can be media account above-mentioned or from media account, either From the corresponding mark of media subscriber, the corresponding mark of generic media user, it can also be that the corresponding mark of user, the application are implemented Example is not specifically limited in this embodiment.
In some embodiments, information aggregate may include the average dispatch amount of object to be sorted, paragraph number, picture number etc. Information aggregate.Wherein, average dispatch amount can refer to that object to be sorted has averagely issued how many articles daily, or wait divide Class object, which is averaged, how many article has been issued every month.In further embodiments, information aggregate may include object to be sorted Article's style, for example, object to be sorted has delivered how many articles in total, wherein how many piece is low quality article.Judge one Whether article is that the method for low quality article can be neural network algorithm, is also possible to through manual examination and verification, the application is real It applies example not limiting this, in further embodiments, the information aggregate of object to be sorted further includes object publishing to be sorted The paragraph number of article, the number of words of each paragraph, the information such as total number of word of article.It in practical applications, can also include other letters Breath, is not specifically limited herein.
In the embodiment of the present application, the information aggregate of the available object to be sorted of server.Specifically, server can be with The information aggregate of object to be sorted is generated by first information set and the second information aggregate.In some embodiments, first Information aggregate can be historical information set, and the second information aggregate can be the information aggregate currently obtained, for example, the same day obtains Information aggregate.
If 202, the information aggregate of object to be sorted meets quality judging rule, it is determined that object to be sorted belongs to the first kind Type object;
In the embodiment of the present application and subsequent embodiment, first kind object can be low mass object, can also claim For low-quality media or low-quality object.
In some embodiments, the information aggregate of object to be sorted, which meets quality judging rule, can be object hair to be sorted In the information of cloth, the accounting of low quality text information is greater than the first accounting threshold value of setting, then illustrates that the object to be sorted is frequent Low quality text information is issued, then the object to be sorted can be attributed to first kind object by server.Text information can be with It is the information such as article, short commentary, news, herein without limitation.Low quality text information can be as malice advertisement, it is terrible, pornographic, Piece together fabricate Deng articles.In the embodiment of the present application, the information for the object to be sorted that server obtains has already passed through certain algorithm It is identified as low quality text information and non-low quality text information, specific algorithm is not specifically limited in the embodiment of the present application.It is low Quality text information accounting can account for all texts of object publishing to be sorted for the low quality text information of object publishing to be sorted The accounting of this information.
In further embodiments, the information aggregate of object to be sorted, which meets quality judging rule, can be object to be sorted In the information of publication, repeated text information accounting is greater than the second accounting threshold value of setting, then illustrates that the object to be sorted is often sent out Cloth repeated text information, then the object to be sorted can be attributed to first kind object by server.In some embodiments, weight Multiple text information can be text weight in the series of articles that text repeats in single article or picture repeats or media are delivered The multiple or duplicate text information of picture.In further embodiments, repeated text information can be single article title just Repeat the multiple perhaps first sentence of paragraph in text or tail sentence repeats or the duplicate text information of paragraph.Illustratively, it services The duplicate checking rate of text information can be calculated in device by text information duplicate checking, can be true if duplicate checking rate is more than certain threshold value Text information is determined to repeat text information.Repeated text information accounting can be the repeated text information of object publishing to be sorted Account for the accounting of all text informations of object publishing to be sorted.
In further embodiments, the information aggregate of object to be sorted, which meets quality judging rule, can be object to be sorted In the information of publication, imperfect text information accounting is greater than the third accounting threshold value of setting, then illustrates that the object to be sorted is frequent Imperfect text information is issued, then the object to be sorted can be attributed to first kind object by server.In some embodiments In, imperfect text information, which can be, lacks five elements in text information, for example, news category text information lack the time, place, One of personage or much information.Server obtain object to be sorted information have already passed through certain algorithm be identified as it is endless Whole text information and non-imperfect text information, specific algorithm are not specifically limited in the embodiment of the present application.Imperfect text envelope Breath accounting can account for accounting for for all text informations of object publishing to be sorted for the imperfect text information of object publishing to be sorted Than.
In further embodiments, the information aggregate of object to be sorted, which meets quality judging rule, can be object to be sorted In the information of publication, thin text information accounting is greater than the 4th accounting threshold value of setting, then illustrates that the object to be sorted is often sent out The thin text information of cloth, then the object to be sorted can be attributed to first kind object by server.In some embodiments, single Thin text information can be article content it is thin, without nutrition, show as arbitrarily cutting several figures plus two sections of words, be also possible to article Body matter length is very few, information is few, can also be that article is delivered as image-text article, but text is largely video, text It is few, there is the suspicion for evading duplicate checking, in practical applications, can also be other similar text information, herein without limitation.One In a little embodiments, server can detecte the number of words of every section of text in these text informations, if every section of average number of words is less than one Determine threshold value, then illustrate that the text information content is thin, determines that text information is thin text information.Thin text information accounting The accounting of all text informations of object publishing to be sorted can be accounted for for the thin text information of object publishing to be sorted.
In further embodiments, the information aggregate of object to be sorted, which meets quality judging rule, can be object to be sorted Title and the title of other objects to be sorted it is largely similar.For example, the tussilago in the Yalu River and Australia nightshade, the duck in the Yalu River The grasswort flower in green river, the hydrangea in the Yalu River, the Yalu River sansevieria trifasciata spend similar, then illustrate that the object to be sorted is infused for batch One of the title of volume, then the object to be sorted can be determined as first kind object by server.In some embodiments, it services Device detects the title of multiple objects to be sorted, and extracts identical text in the titles of multiple objects to be sorted, then detect to In the title of object of classification, identical text accounts for the accounting of the total text of title, if the accounting is greater than the threshold value of setting, server The object to be sorted can be determined as first kind object.
In further embodiments, the information aggregate of object to be sorted, which meets quality judging rule, can be object to be sorted The dispatch frequency be greater than setting frequency threshold value, then illustrate that the frequency of the object publishing information to be sorted is abnormal, it may be possible to copy Other people information are attacked, or the text information much manufactured in a rough and slipshod way, server can determine that the object to be sorted belongs to first Type object.In some embodiments, the dispatch frequency of object to be sorted is the frequency of object publishing text information to be sorted.Example Such as, object to be sorted is averaged and delivers 500 text informations every month, the dispatch frequency of the object to be sorted be 500 monthly.Again For example, object to be sorted is average to deliver 10 text informations daily, then the dispatch frequency of the object to be sorted is 10 daily.If The dispatch frequency of the object to be sorted is more than preset frequency threshold value, then server can determine that the object to be sorted is the first kind Type object.
In practical applications, other quality judgings rule can also be set, details are not described herein again for the embodiment of the present application.
If 203, the information aggregate of object to be sorted does not meet quality judging rule, according to the information collection of object to be sorted Close the characteristic information for obtaining object to be sorted;
In the embodiment of the present application, if the information aggregate of object to be sorted does not meet quality judging rule, server can To be determined by model.It specifically can be, it is to be sorted right that server is obtained according to the information aggregate of object to be sorted first The characteristic information of elephant.
In some embodiments, the characteristic information of object to be sorted may include the essential information of object to be sorted, dispatch Distribution situation, the text information of object publishing to be sorted, the structure feature of text information, low quality text information accounting etc.. In some embodiments, the essential information of object to be sorted can be the information such as title, grade, the type of object to be sorted.In In some embodiments, the dispatch distribution situation of object to be sorted can send the documents channel, channel number variance, channel number cross entropy based on Deng.In some embodiments, the text information of object publishing to be sorted can be article, paragraph, news etc..In some embodiments In, the structure feature of the text information of object publishing to be sorted can be paragraph number, picture number, the punctuate rule of text information Deng.In practical applications, the characteristic information of object to be sorted can also be that other situations, the embodiment of the present application do not limit this It is fixed.
In some embodiments, server can select more from candidate characteristic information according to feature selecting algorithm Important characteristic information.Feature selecting algorithm is not specifically limited in the embodiment of the present application.Illustratively, server can select It is more important that the features such as long article accounting are selected into region article accounting, medium type, other serious low-quality articles, dispatch.
204, pass through the first quality classification model and obtain classification results corresponding to characteristic information, wherein classification results are First classification results or the second classification results, the first classification results indicate that object to be sorted belongs to first kind object, second point Class result indicates that object to be sorted belongs to Second Type object or third type object.
In the embodiment of the present application, the first quality classification model can be each class model in machine learning, for example, first Quality classification model, which can be, promotes tree-model (xgboost model), is also possible to other models classified, and the application is real The types of models that example does not limit use specifically is applied, could alternatively be various other effective novel model structures.The application is real It applies example and does not also limit characteristic information quantity and type.
In some embodiments, the characteristic information of object to be sorted can be inputted the first quality classification model by server, Classification results corresponding to characteristic information are obtained, the classification results of object to be sorted are referred to as.If the classification results are the One classification results, then object to be sorted belongs to first kind object, if the classification results are the second classification results, this is to be sorted Object belongs to Second Type object or third type object, that type of specific data can be carried out true by subsequent embodiment It is fixed.
In some embodiments, Second Type object is referred to as high-quality media, and third type object is referred to as Middle matter media (media of medium quality).
In some embodiments, the manager of information publishing platform can by terminal device logs server, check to The type certain situation of object of classification, server can show the type certain situation of object to be sorted by terminal device.Fig. 3 The examples of interfaces figure of the type certain situation of object to be sorted is shown for server in the embodiment of the present application.As it can be seen that server exhibition It may include title bar, function plate and main interface in the interface shown, wherein title bar is used for the title of display interface, function Plate is used for for user's selection function, includes object to be sorted, the title of object to be sorted and object to be sorted in main interface Type, the title bar in other figures of the application is similar with function plate, will not be described in great detail.Illustratively, object 1 to be sorted is XX top news, server determine that its type is third type object, and object 2 to be sorted is XX news, and server determines that its type is Second Type object, object 3 to be sorted are XX people number, and server determines that its type is first kind object.In practical application In, it can also be shown by other objects to be sorted and the type of determination, herein without limitation.In addition, can also be opened up on interface Show more contents, is not specifically limited herein.
In some embodiments, staff can be by the type that object to be sorted is audited at interface as shown in Figure 3 It is no correct.If the type that staff views object to be sorted is determined that staff can click boundary by server mistake " change " virtual push button on face, so that the type for treating object of classification is modified.Fig. 4 is the embodiment of the present application Type Change Examples of interfaces figure.As it can be seen that being popped up on interface after staff clicks " change " virtual push button on interface as shown in Figure 3 Type choice box.Illustratively, staff clicks first kind object as shown in Figure 4, then the type of object to be sorted can To be changed to first kind object.
Optionally, on the basis of above-mentioned Fig. 2 corresponding each embodiment, object type provided by the embodiments of the present application is true In one alternative embodiment of fixed method, server can further determine that object to be sorted belongs to second by following steps Type object or third type object.These steps are as follows:
If classification results are the second classification results, it is corresponding right that object to be sorted is obtained by the second quality classification model As vector;
Calculate the similarity of object to be sorted corresponding object vectors and Second Type object vectors, Second Type object to Amount is the corresponding vector of Second Type object;
If similarity is greater than given threshold, it is determined that object to be sorted belongs to Second Type object;
If similarity is less than or equal to given threshold, it is determined that object to be sorted belongs to third type object.
In some embodiments, Second Type object is referred to as high-quality media or high-quality object.Therefore, server can To acquire Second Type object vectors by the second quality classification model previously according to selected high-quality object.These Two type object vectors as standard, can object vectors corresponding with object to be sorted be compared, if the two vector ratios It is more similar, then illustrate that object to be sorted is similar compared with high-quality object, it is to a certain extent it is considered that more similar wait divide Class object is high-quality object.
In the embodiment of the present application, it is corresponding right can to obtain object to be sorted by the second quality classification model for server As vector, the second quality classification model can be the model in natural language processing technique.In some embodiments, the second mass Disaggregated model can be figure vector model, such as DeepWalk algorithm model, GraphSage algorithm model etc., and the application is implemented Example is not specifically limited in this embodiment.Server can be obtained according to the information aggregate of object to be sorted by the second quality classification model Take the corresponding object vectors of object to be sorted.
Then, in some embodiments, server can calculate the corresponding object vectors of object to be sorted and Second Type The similarity of object vectors specifically can be and calculate the remaining of the corresponding object vectors of object to be sorted and Second Type object vectors String similarity.If the cosine similarity is greater than given threshold, server can determine that object to be sorted belongs to Second Type pair As;If the cosine similarity is less than or equal to given threshold, server can determine that object to be sorted belongs to third type pair As.
Optionally, on the basis of above-mentioned Fig. 2 corresponding each embodiment, object type provided by the embodiments of the present application is true In one alternative embodiment of fixed method, server can obtain object to be sorted according to the information aggregate of object to be sorted Sequence is clicked, mark and the corresponding user identifier of object to be sorted that sequence includes object to be sorted are clicked;Then according to point Sequence is hit, the corresponding object vectors of object to be sorted are obtained by the second quality classification model.
In the embodiment of the present application, illustratively, by taking DeepWalk algorithm model as an example, DeepWalk is one kind with no prison The figure vector algorithm that educational inspector practises, it is similar to term vector in the training process.Server is first according to the information of object to be sorted Set obtains the click sequence of object to be sorted.In some embodiments, server can count the corresponding text of object to be sorted When this information is clicked, the User ID of clicking operation is carried out, statistics obtains the click sequence of each object to be sorted on this basis Column.For example, the text information of object publishing to be sorted is clicked by user 1, user 2 and user 3, then server statistics obtain Click sequence is object identity identification number to be sorted (identity document, ID), user 1, user 2, user 3.In this Shen The representation that in embodiment, please click sequence can be with are as follows:
Click sequence=[object ID to be sorted, User ID, User ID, User ID, User ID ...]
In some embodiments, server will click on sequence inputting DeepWalk algorithm model.Illustratively, DeepWalk Include two steps in algorithm model:
A. random walk is executed on the node in figure generate sequence node;
B. term vector model (skip-gram model) is run, each node is learnt according to the sequence node generated in step a Insertion.
The output of the available DeepWalk algorithm model of server, i.e., the corresponding object vectors of object to be sorted.
Optionally, on the basis of above-mentioned Fig. 2 corresponding each embodiment, object type provided by the embodiments of the present application is true In one alternative embodiment of fixed method, before the information aggregate for obtaining object to be sorted, method further include:
Text information is obtained, text information and object to be sorted have corresponding relationship;
It is counted to obtain the first information set of object to be sorted according to text information;
History text information is obtained, history text information and object to be sorted have corresponding relationship;
The second information aggregate of object to be sorted is obtained according to history text Information Statistics.
Illustratively, if server carries out the determination of an object type daily, server can be obtained according to daily To text information and history text information in information publishing platform object to be sorted carry out object type determination, work as clothes After the object type that business device completes the same day determines, the text information that the same day gets can be stored in be used as in historical data base and gone through History text information.Illustratively, server can also be that every month carries out the determination of an object type, then server can root Object type is carried out to the object to be sorted in information publishing platform according to the text information and history text information monthly got Determination, after server, which completes of that month object type, to be determined, the text information that this month can be got is stored in history number According in library be used as history text information.
In the embodiment of the present application, server can first obtain the text information for having corresponding relationship with object to be sorted. Illustratively, the available a large amount of text information sent to terminal device of server, for example, what object to be sorted 1 was issued Article 2, article 3 of the publication of object to be sorted 3 that article 1, object to be sorted 2 are issued etc..Then, server can be by these texts This information is counted, and the text information of object publishing to be sorted is obtained, for example, object to be sorted has currently issued 10 texts altogether Chapter is article 1, article 4, article 5 etc. respectively.The text information of these object publishings to be sorted is referred to as first information collection It closes.
Fig. 5 is the examples of interfaces figure that user passes through terminal device edit text message in the embodiment of the present application.As it can be seen that in master On interface, user can be with input header and text.Also, in some embodiments, user can also be in text insert pictures Or video, with the content of rich text information.After user has edited text information, publication can be clicked, then terminal device In response to the clicking operation, the text information edited can be sent to server.It is complete that server then can receive the editor Text information, text information may include title, text etc..
Fig. 6 is the examples of interfaces figure that user passes through mobile phone edit text message in the embodiment of the present application.As it can be seen that in the mobile phone On interface, user can be with input header and text.Also, in some embodiments, user can also be in text insert pictures Or video, with the content of rich text information.After user has edited text information, publication can be clicked, then mobile phone responds In the clicking operation, the text information edited can be sent to server.Server then can receive the complete text of the editor This information, text information may include title, text etc..
In the embodiment of the present application, server can be got from historical data base has corresponding close with object to be sorted The history text information of system.Illustratively, first 3 months history text information is preserved in historical data base, then server can To get the first 3 months history text information that there is corresponding relationship with object to be sorted from historical data base.Then, it takes Business device can be counted to obtain object to be sorted in preceding 3 months text informations issued altogether.For example, object to be sorted is preceding 3 Article 11, article 12, article 13 etc. have been issued within a month altogether.Text information after these statistics can be used as the second information aggregate.
Server can gather the information aggregate that object to be sorted is generated with the second information aggregate according to the first information.One In a little embodiments, server can count to obtain the text information that object to be sorted is issued in total, and then statistics obtains to be sorted The text that the average text information quantity (being referred to as the daily dispatch frequency) issued daily of object, average every month are issued Information content (the dispatch frequency for being referred to as every month), object publishing to be sorted text information average paragraph number, to The mean chart the piece number of text information of object of classification publication, how many low quality text in the text information of object publishing to be sorted Information, the average number of words of text information etc. the information of object publishing to be sorted.
Optionally, on the basis of above-mentioned Fig. 2 corresponding each embodiment, object type provided by the embodiments of the present application is true In one alternative embodiment of fixed method, if similarity is greater than given threshold, it is determined that object to be sorted belongs to Second Type Object includes:
If similarity is greater than given threshold, Second Type markup information is obtained;
Determine that the annotation results of object to be sorted, annotation results include the first annotation results according to Second Type markup information With the second annotation results, the first annotation results for indicating that object to be sorted belongs to Second Type object, use by the second annotation results Belong to third type object in expression object to be sorted.
In some embodiments, if the similarity of the object vectors of object to be sorted and Second Type object vectors is greater than and sets Determine threshold value, server can be by the object to be sorted compared with the Second Type markup information that user inputs, if Second Type mark Infusing information includes the object to be sorted, then available first annotation results of server, indicate that object to be sorted belongs to the second class Type object, if Second Type markup information does not include the object to be sorted, available second annotation results of server are indicated Object to be sorted belongs to third type object.
In some embodiments, if the similarity of the object vectors of object to be sorted and Second Type object vectors is greater than and sets Determine threshold value, then server can show corresponding object to be sorted, and user, which can choose, wherein thinks more good Object to be sorted is labeled.In response to the selection labeling operation of user, server is available to arrive Second Type markup information, The Second Type markup information includes that user selects the corresponding object to be sorted of labeling operation.Then, server can be by user The corresponding object to be sorted of selection labeling operation is determined to belong to Second Type object.
Optionally, on the basis of above-mentioned Fig. 2 corresponding each embodiment, object type provided by the embodiments of the present application is true In one alternative embodiment of fixed method, server has determined that the type of object to be sorted (belongs to first kind object or category In Second Type object or belong to third type object) after, the type of object to be sorted can be updated in the database.
Illustratively, it is assumed that performance always is good before object 1 to be sorted, and the information of publication is essentially quality information, because The type of this object is Second Type object (high-quality object), and still, since some moon, which starts greatly Amount publication low quality text information, then server determines the object 1 to be sorted according to the method for the corresponding each embodiment of such as Fig. 2 Type be first kind object (low-quality object), then server determine after, the object 1 to be sorted can be updated in the database Type be first kind object (low-quality object).
Based on above-mentioned each embodiment, the embodiment of the present application also provides the applications for a kind of method that object type determines , in the application examples, object to be sorted is referred to as media, and text information is referred to as article, the letter of object to be sorted Breath set is referred to as media information, and first kind object is referred to as low-quality media, and Second Type object can also claim For high-quality media, third type object is referred to as middle matter media.
Fig. 7 is the example topology figure of the application examples for the method that a kind of object type provided by the embodiments of the present application determines.Fig. 7 Middle Y (Yes) indicate logic judgment be it is yes, N (No) expression logic judgment is no.
In the application examples, server can collect new article information first, can be the information that statistics is newly sent the documents daily, Such as: media newly send the documents the essential information of chapter, the low-quality type of machine recognition, people examine low-quality type.
In the application examples, server can convert the new article information being collected into new media information, and from history History media information is got in database, integrates to obtain media information, media according to new media information and history media information The information aggregate of object to be sorted in the type and quantity such as foregoing individual embodiments of information, such as the newest average hair of media The information such as Wen Liang, paragraph number, picture number, details are not described herein again.
In some application examples, server can be carried out media information by low-quality rule module according to media information Logic judgment judges whether corresponding media are low-quality media.Illustratively, low-quality media can be divided into serious low-quality class matchmaker Body, media of manufacturing in a rough and slipshod way, malice cheating class media, when judgement media have following situations, then may determine that the media be it is low Matter media.
1) serious low-quality class media:
Serious low-quality class media dispatch quality level, there are apparent quality problems for media dispatch, and do media presentation Divide the type for being easiest to start with, is broadly divided into following several situations.
If a) media largely issue low-quality article, such as: malice advertisement, it is terrible, pornographic, piece together and fabricate article;
B) article is imperfect, news category news lacks five elements (time, place, personage etc., fabricate stories).
2) it manufactures in a rough and slipshod way media:
Class of manufacturing in a rough and slipshod way media refer specifically to, and production cost is simple, and most of part can be generated by machine, and manually Method simple modifications are embodied in repetition or crawl some texts on the net, in addition simple figure, most articles are without battalion The features such as supporting.
A) class media are repeated:
It repeats class media presentation and is divided into two kinds: the system that text repeats in single article or picture repeats or media are delivered Text repeats in column article or picture repeats;
Single article title repeats repeatedly in the body of the email, and the first sentence of paragraph or tail sentence repeat, and paragraph repeats.
There are common features for the article of repetition class media releasing: occurring before and after media series of articles too long meaningless Words, nonsense is more, place mat is too long (such as " small volume language ", lead etc., gather number of words) or the identical picture of media series of articles.
B) media majority article content it is thin, without nutrition, show as arbitrarily cutting several figures plus two sections of words.
C) the body matter length of media majority article is very few, information is few.
D) media majority article is delivered as image-text article, but text is largely video, and text is few, has and evades duplicate checking Suspicion.
3) malice cheating class media:
A) similar time, batch registration similar media name;
If media name generallys use uncommon, the longer attribute of length as common prefix, the media name of suffix, more Number is batch registration.Such as: the tussilago in the Yalu River, Australia nightshade in the Yalu River, the Yalu River grasswort flower, the Yalu River eight Celestial being is spent, the sansevieria trifasciata in the Yalu River is spent
B) different media have identical lead, polite:
C) media are sent the documents the frequency: non-media individual media is frequently sent the documents, and odd-numbered day dispatch is excessive, has batch to send the documents or plagiarize Suspicion;
D) there is competing product media name or media name occurs and accredited media name is not inconsistent in (text, picture) in article.
The media of the above-mentioned type can set corresponding rule and be determined, it is corresponding to be specifically referred to earlier figures 2 The description of step 202 in each embodiment, details are not described herein again.
Surely whole low-quality media detections are come out since the low-quality rule module in application examples is different, some In application examples, server can also be determined by low-quality media model.Server can obtain the feature letter of media first Breath, characteristic information are predominantly following several:
1) media content features:
A) essential information: referring mainly to some essential characteristics of media, such as: media name, original media grade, medium type Deng;
B) article dispatch verticality:
Measuring a media is the dispatch distribution situation in specific vertical field, such as: main dispatch channel, channel number variance, Channel number cross entropy etc.;
C) text structure feature:
This feature refers to the feature of media dispatch structure, such as: paragraph number, picture number, punctuate rule;
D) media low-quality number of types and accounting:
Media low-quality type includes that machine low-quality and people examine low-quality, wherein machine low-quality includes title party, story party, list The quantity such as thin and accounting;It includes the quantity such as malice advertisement, regular price-line advertising, terrible and accounting that people, which examines low-quality,;
2) media behavioural characteristic:
A) article common feature:
The article repeat number in single article and media January such as title, first sentence, picture;
B) dispatch feature:
Total low-quality article number, total article number, single day maximum dispatch number etc. in one month;
C) cheating category feature:
It is whether similar to other media names, whether deposit and have identical paragraph etc. with other media;
3) user behavior characteristics:
Stay time, month light exposure, a click volume etc.;
4) other products behavioural characteristic:
The data such as exposure rate, the clicking rate of product.
Server can extract 100 multidimensional characteristics in total, input the feature into the disaggregated models such as xgboost later, carry out Two classification.In some application examples, server can be can analyze by feature selecting algorithm obtains region article accounting, media Type, other serious low-quality articles, the features such as long article accounting are more important in dispatch, thus, server can remove superfluous Remaining feature retains important feature, improves the speed that high-quality media model executes.
In the application example, after screening server goes out low-quality media, need to carry out remaining non-low-quality media further Screening, can be and be screened out from it high-quality media.Illustratively, server can be screened by high-quality media model.
Firstly, server can click the corresponding media of article, according to the media information counting user of non-low-quality media The user that each media are counted on the basis of this clicks sequence, input form are as follows: media ID, User ID _ 1, User ID _ 2 ....
Then, server can use the vector that the training of DeepWalk scheduling algorithm obtains non-low-quality media, and DeepWalk is A kind of figure vector algorithm with unsupervised learning, it is similar to term vector in the training process.
DeepWalk algorithm includes two steps:
A. random walk is executed on the node in figure generate sequence node;
B. skip-gram is run, learns the insertion of each node according to the sequence node generated in step a.
Finally, server can using the high-quality media of history in database as Seed Media, using non-low-quality media to Amount does similarity calculation with Seed Media vector, candidate using the higher media of similarity as high-quality media.
Server can show high-quality media candidate on interface, as shown in figure 8, Fig. 8 is to open up in the embodiment of the present application Show the examples of interfaces figure of high-quality media candidate, shows media account, title and choice box in main interface, be shown in such as Fig. 8 Media account on interface is that high-quality media are candidate.Confirm that certain several high-quality media candidate can be excellent when staff audits Matter media can then click "Yes" virtual push button in the choice box of corresponding media account, then terminal device can be in response to The clicking operation, the markup information of staff is sent to server, which indicates that staff is high-quality at which "Yes" virtual push button is clicked in media candidate.
Remaining media are neither low-quality media, nor high-quality media.Then these media can be determined as by server Middle matter media.
Finally, server can update determining medium type into database.
Xgboost algorithm is used in above application examples as sorting algorithm, actually the embodiment of the present application does not limit specifically Surely the types of models used could alternatively be various other effective novel model structures, not limit feature quantity and type, It can be suitable for the feature of specific business according to specific service design and suitable feature can be selected according to feature selecting algorithm.
Mining algorithm is used to not only that high-quality media excavate in above application examples, is equally applicable to the digging of the media such as low-quality Pick, also not necessarily DeepWalk algorithm, other figure vector algorithms can be replaced same algorithm, such as: GraphSage etc..
It is to greatly reduce artificial operation media grade inefficiency to ask in place of the main innovation of the embodiment of the present application Topic, such as: a. low-quality media: artificial efficiency of operation is low, expends the problem of huge manpower and material resources;B. high-quality media: due to excellent Matter media negligible amounts, and the audit condition of history white list is more stringent, and algorithm provided by the invention can excavate largely High-quality media, so that more high-quality articles be made to be exposed.
The embodiment of the present application improves the problems such as judging by accident seriously using rule and method merely and recall deficiency;
The characteristics of the embodiment of the present application EMS memory occupation is low, low latency makes that it is suitable for the exclusive application of resource occupation Scene;
The embodiment of the present application has flexibility and versatility, supports to formulate different characteristic for specific business, therefore extensive Ability is strong, is applicable under plurality of application scenes.
Fig. 9 shows the device 900 that a kind of object type provided by the embodiments of the present application determines, comprising:
Acquiring unit 901, for obtaining the information aggregate of object to be sorted, wherein information aggregate is according to the first information What set and the second information aggregate generated;
Processing unit 902, if the information aggregate for object to be sorted meets quality judging rule, it is determined that be sorted right As belonging to first kind object;
Processing unit 902, if the information aggregate for being also used to object to be sorted does not meet quality judging rule, according to wait divide The information aggregate of class object obtains the characteristic information of object to be sorted;
Processing unit 902 is also used to obtain classification results corresponding to characteristic information by the first quality classification model, In, classification results are the first classification results or the second classification results, and the first classification results indicate that object to be sorted belongs to the first kind Type object, the second classification results indicate that object to be sorted belongs to Second Type object or third type object.
Optionally, on the basis of above-mentioned Fig. 9 corresponding each embodiment, object type provided by the embodiments of the present application is true In one alternative embodiment of fixed device, processing unit 902 is also used to:
If classification results are the second classification results, it is corresponding right that object to be sorted is obtained by the second quality classification model As vector;
Calculate the similarity of object to be sorted corresponding object vectors and Second Type object vectors, Second Type object to Amount is the corresponding vector of Second Type object;
If similarity is greater than given threshold, it is determined that object to be sorted belongs to Second Type object;
If similarity is less than or equal to given threshold, it is determined that object to be sorted belongs to third type object.
Optionally, on the basis of above-mentioned Fig. 9 corresponding each embodiment, object type provided by the embodiments of the present application is true In one alternative embodiment of fixed device, acquiring unit 901 is also used to:
Text information is obtained, text information and object to be sorted have corresponding relationship;
It is counted to obtain the first information set of object to be sorted according to text information;
History text information is obtained, history text information and object to be sorted have corresponding relationship;
The second information aggregate of object to be sorted is obtained according to history text Information Statistics.
Optionally, on the basis of above-mentioned Fig. 9 corresponding each embodiment, object type provided by the embodiments of the present application is true In one alternative embodiment of fixed device, processing unit 902 is also used to:
If low quality text information accounting is greater than the first accounting threshold value of setting in the information aggregate of object to be sorted, really Fixed object to be sorted belongs to first kind object, and low quality text information accounting is the corresponding low quality text envelope of object to be sorted Breath accounts for the accounting of the corresponding text information of object to be sorted.
Optionally, on the basis of above-mentioned Fig. 9 corresponding each embodiment, object type provided by the embodiments of the present application is true In one alternative embodiment of fixed device, processing unit 902 is also used to:
If repeated text information accounting is greater than the second accounting threshold value of setting in the information aggregate of object to be sorted, it is determined that Object to be sorted belongs to first kind object, and repeated text information accounting is the corresponding higher text of duplicate checking rate of object to be sorted Information accounts for the accounting of the corresponding text information of object to be sorted.
Optionally, on the basis of above-mentioned Fig. 9 corresponding each embodiment, object type provided by the embodiments of the present application is true In one alternative embodiment of fixed device, processing unit 902 is also used to:
If the dispatch frequency of object to be sorted is greater than the frequency threshold value of setting in the information aggregate of object to be sorted, it is determined that Object to be sorted belongs to first kind object, and the dispatch frequency of object to be sorted is the frequency of object publishing text information to be sorted It is secondary.
Optionally, on the basis of above-mentioned Fig. 9 corresponding each embodiment, object type provided by the embodiments of the present application is true In one alternative embodiment of fixed device, processing unit 902 is also used to:
The click sequence of object to be sorted is obtained according to the information aggregate of object to be sorted, it includes to be sorted right for clicking sequence The mark of elephant and the corresponding user identifier of object to be sorted;
According to sequence is clicked, the corresponding object vectors of object to be sorted are obtained by the second quality classification model.
Optionally, on the basis of above-mentioned Fig. 9 corresponding each embodiment, object type provided by the embodiments of the present application is true In one alternative embodiment of fixed device, processing unit 902 is also used to:
If similarity is greater than given threshold, Second Type markup information is obtained;
Determine that the annotation results of object to be sorted, annotation results include the first annotation results according to Second Type markup information With the second annotation results, the first annotation results for indicating that object to be sorted belongs to Second Type object, use by the second annotation results Belong to third type object in expression object to be sorted.
Figure 10 is a kind of server architecture schematic diagram provided by the embodiments of the present application, which can be because of configuration or property Energy is different and generates bigger difference, may include one or more central processing units (central processing Units, CPU) 1022 (for example, one or more processors) and memory 1032, one or more storage applications The storage medium 1030 (such as one or more mass memory units) of program 1042 or data 1044.Wherein, memory 1032 and storage medium 1030 can be of short duration storage or persistent storage.The program for being stored in storage medium 1030 may include one A or more than one module (diagram does not mark), each module may include to the series of instructions operation in server.More into One step, central processing unit 1022 can be set to communicate with storage medium 1030, execute storage medium on server 1000 Series of instructions operation in 1030.
Server 1000 can also include one or more power supplys 1026, one or more wired or wireless nets Network interface 1050, one or more input/output interfaces 1058, and/or, one or more operating systems 1041, example Such as Windows ServerTM, Mac OS XTM, UnixTM,LinuxTM, FreeBSDTMEtc..
The step as performed by server can be based on the server architecture shown in Fig. 10 in above-described embodiment.
In the embodiment of the present application, CPU1022 is specifically used for:
Obtain the information aggregate of object to be sorted, wherein information aggregate is according to first information set and the second information What set generated;
If the information aggregate of object to be sorted meets quality judging rule, it is determined that object to be sorted belongs to the first kind pair As;
If the information aggregate of object to be sorted does not meet quality judging rule, obtained according to the information aggregate of object to be sorted Take the characteristic information of object to be sorted;
Classification results corresponding to characteristic information are obtained by the first quality classification model, wherein classification results first Classification results or the second classification results, the first classification results indicate that object to be sorted belongs to first kind object, the second classification knot Fruit indicates that object to be sorted belongs to Second Type object or third type object.
In the embodiment of the present application, CPU1022 is also used to:
If classification results are the second classification results, it is corresponding right that object to be sorted is obtained by the second quality classification model As vector;
Calculate the similarity of object to be sorted corresponding object vectors and Second Type object vectors, Second Type object to Amount is the corresponding vector of Second Type object;
If similarity is greater than given threshold, it is determined that object to be sorted belongs to Second Type object;
If similarity is less than or equal to given threshold, it is determined that object to be sorted belongs to third type object.
In the embodiment of the present application, CPU1022 is also used to:
Text information is obtained, text information and object to be sorted have corresponding relationship;
It is counted to obtain the first information set of object to be sorted according to text information;
History text information is obtained, history text information and object to be sorted have corresponding relationship;
The second information aggregate of object to be sorted is obtained according to history text Information Statistics.
In the embodiment of the present application, CPU1022 is also used to:
If low quality text information accounting is greater than the first accounting threshold value of setting in the information aggregate of object to be sorted, really Fixed object to be sorted belongs to first kind object, and low quality text information accounting is the corresponding low quality text envelope of object to be sorted Breath accounts for the accounting of the corresponding text information of object to be sorted.
In the embodiment of the present application, CPU1022 is also used to:
If repeated text information accounting is greater than the second accounting threshold value of setting in the information aggregate of object to be sorted, it is determined that Object to be sorted belongs to first kind object, and repeated text information accounting is the corresponding higher text of duplicate checking rate of object to be sorted Information accounts for the accounting of the corresponding text information of object to be sorted.
In the embodiment of the present application, CPU1022 is also used to:
If the dispatch frequency of object to be sorted is greater than the frequency threshold value of setting in the information aggregate of object to be sorted, it is determined that Object to be sorted belongs to first kind object, and the dispatch frequency of object to be sorted is the frequency of object publishing text information to be sorted It is secondary.
In the embodiment of the present application, CPU1022 is also used to:
The click sequence of object to be sorted is obtained according to the information aggregate of object to be sorted, it includes to be sorted right for clicking sequence The mark of elephant and the corresponding user identifier of object to be sorted;
According to sequence is clicked, the corresponding object vectors of object to be sorted are obtained by the second quality classification model.
In the embodiment of the present application, CPU1022 is also used to:
If similarity is greater than given threshold, Second Type markup information is obtained;
Determine that the annotation results of object to be sorted, annotation results include the first annotation results according to Second Type markup information With the second annotation results, the first annotation results for indicating that object to be sorted belongs to Second Type object, use by the second annotation results Belong to third type object in expression object to be sorted.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In several embodiments provided herein, it should be understood that disclosed system, device and method can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit It divides, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or The mutual coupling, direct-coupling or communication connection discussed can be through some interfaces, the indirect coupling of device or unit It closes or communicates to connect, can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.
It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in a computer readable storage medium.Based on this understanding, the technical solution of the application is substantially The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words It embodies, which is stored in a storage medium, including some instructions are used so that a computer Equipment (can be personal computer, server or the network equipment etc.) executes the complete of each embodiment the method for the application Portion or part steps.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. are various can store journey The medium of sequence code.

Claims (10)

1. a kind of method that object type determines characterized by comprising
Obtain the information aggregate of object to be sorted, wherein the information aggregate is according to first information set and the second information What set generated;
If the information aggregate of the object to be sorted meets quality judging rule, it is determined that the object to be sorted belongs to the first kind Type object;
If the information aggregate of the object to be sorted does not meet the quality judging rule, according to the letter of the object to be sorted Breath set obtains the characteristic information of the object to be sorted;
Classification results corresponding to the characteristic information are obtained by the first quality classification model, wherein the classification knot Fruit is the first classification results or the second classification results, and first classification results indicate that the object to be sorted belongs to described first Type object, second classification results indicate that the object to be sorted belongs to Second Type object or third type object.
2. the method according to claim 1, wherein described by described in the first quality classification model acquisition After classification results corresponding to characteristic information, the method also includes:
If the classification results are second classification results, the object to be sorted is obtained by the second quality classification model Corresponding object vectors;
Calculate the similarity of the object to be sorted corresponding object vectors and Second Type object vectors, the Second Type pair As vector is the corresponding vector of the Second Type object;
If the similarity is greater than given threshold, it is determined that the object to be sorted belongs to the Second Type object;
If the similarity is less than or equal to the given threshold, it is determined that the object to be sorted belongs to the third type pair As.
3. the method according to claim 1, wherein before the information aggregate for obtaining object to be sorted, institute State method further include:
Text information is obtained, the text information and the object to be sorted have corresponding relationship;
It is counted to obtain the first information set of the object to be sorted according to the text information;
History text information is obtained, the history text information and the object to be sorted have corresponding relationship;
The second information aggregate of the object to be sorted is obtained according to the history text Information Statistics.
4. if the method according to claim 1, wherein the information aggregate of the object to be sorted meets matter Measure decision rule, it is determined that the object to be sorted belongs to first kind object and includes:
If low quality text information accounting is greater than the first accounting threshold value of setting in the information aggregate of the object to be sorted, really The fixed object to be sorted belongs to first kind object, and the low quality text information accounting is that the object to be sorted is corresponding Low quality text information accounts for the accounting of the corresponding text information of the object to be sorted.
5. if the method according to claim 1, wherein the information aggregate of the object to be sorted meets matter Measure decision rule, it is determined that the object to be sorted belongs to first kind object and includes:
If repeated text information accounting is greater than the second accounting threshold value of setting in the information aggregate of the object to be sorted, it is determined that The object to be sorted belongs to first kind object, and the repeated text information accounting is the corresponding duplicate checking of the object to be sorted The higher text information of rate accounts for the accounting of the corresponding text information of the object to be sorted.
6. if the method according to claim 1, wherein the information aggregate of the object to be sorted meets matter Measure decision rule, it is determined that the object to be sorted belongs to first kind object and includes:
If the dispatch frequency of object to be sorted described in the information aggregate of the object to be sorted is greater than the frequency threshold value of setting, Determine that the object to be sorted belongs to first kind object, the dispatch frequency of the object to be sorted is the object hair to be sorted The frequency of cloth text information.
7. according to the method described in claim 2, it is characterized in that, described pass through described in the acquisition of the second quality classification model wait divide The corresponding object vectors of class object include:
The click sequence of the object to be sorted is obtained according to the information aggregate of the object to be sorted, the click sequence includes The mark of the object to be sorted and the corresponding user identifier of the object to be sorted;
According to the click sequence, the corresponding object vectors of the object to be sorted are obtained by the second quality classification model.
8. if according to the method described in claim 2, it is characterized in that, the similarity is greater than given threshold, it is determined that The object to be sorted belongs to the Second Type object
If the similarity is greater than given threshold, Second Type markup information is obtained;
Determine that the annotation results of the object to be sorted, the annotation results include first according to the Second Type markup information Annotation results and the second annotation results, first annotation results are for indicating that the object to be sorted belongs to the Second Type Object, second annotation results are for indicating that the object to be sorted belongs to the third type object.
9. the device that a kind of object type determines characterized by comprising
Acquiring unit, for obtaining the information aggregate of object to be sorted, wherein the information aggregate is according to first information set And second information aggregate generate;
Processing unit, if the information aggregate for the object to be sorted meets quality judging rule, it is determined that described to be sorted Object belongs to first kind object;
Processing unit, if the information aggregate for being also used to the object to be sorted does not meet the quality judging rule, according to institute The information aggregate for stating object to be sorted obtains the characteristic information of the object to be sorted;
Processing unit is also used to obtain classification results corresponding to the characteristic information by the first quality classification model, Wherein, the classification results are the first classification results or the second classification results, and first classification results indicate described to be sorted Object belongs to the first kind object, second classification results indicate the object to be sorted belong to Second Type object or Third type object.
10. a kind of server characterized by comprising
One or more central processing units, memory, input/output interface, wired or wireless network interface, power supply;
The memory is of short duration storage memory or persistent storage memory;
The central processing unit is configured to communicate with the memory, and the instruction in the memory is executed on the server Operation is in method described in any one of perform claim requirement 1 to 8.
CN201910841009.8A 2019-09-05 2019-09-05 A kind of method and relevant apparatus that object type is determining Pending CN110532331A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910841009.8A CN110532331A (en) 2019-09-05 2019-09-05 A kind of method and relevant apparatus that object type is determining

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910841009.8A CN110532331A (en) 2019-09-05 2019-09-05 A kind of method and relevant apparatus that object type is determining

Publications (1)

Publication Number Publication Date
CN110532331A true CN110532331A (en) 2019-12-03

Family

ID=68667384

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910841009.8A Pending CN110532331A (en) 2019-09-05 2019-09-05 A kind of method and relevant apparatus that object type is determining

Country Status (1)

Country Link
CN (1) CN110532331A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111458471A (en) * 2019-12-19 2020-07-28 中国科学院合肥物质科学研究院 Water area detection early warning method based on graph neural network
CN112287037A (en) * 2020-10-23 2021-01-29 大连东软教育科技集团有限公司 Multi-entity mixed knowledge graph construction method and device and storage medium
CN113207013A (en) * 2020-02-03 2021-08-03 腾讯科技(深圳)有限公司 Multimedia data release management method, device, equipment and storage medium
WO2023115890A1 (en) * 2021-12-22 2023-06-29 郑州云海信息技术有限公司 Text quality cleaning method and apparatus, and medium
CN112287037B (en) * 2020-10-23 2024-05-31 东软教育科技集团有限公司 Multi-entity mixed knowledge graph construction method, device and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104991896A (en) * 2015-05-26 2015-10-21 百度在线网络技术(北京)有限公司 Method and apparatus for analyzing two-dimension codes
CN107391545A (en) * 2017-05-25 2017-11-24 阿里巴巴集团控股有限公司 A kind of method classified to user, input method and device
CN109165839A (en) * 2018-08-17 2019-01-08 龙马智芯(珠海横琴)科技有限公司 The processing method and processing device of data
CN109559246A (en) * 2018-10-31 2019-04-02 北京春雨天下软件有限公司 Enter group checking method, audit server, client and enters group auditing system
US20190243919A1 (en) * 2018-02-06 2019-08-08 Microsoft Technology Licensing, Llc Multilevel representation learning for computer content quality
CN110120912A (en) * 2019-05-10 2019-08-13 腾讯科技(深圳)有限公司 Rich-media content processing method, device, readable storage medium storing program for executing and computer equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104991896A (en) * 2015-05-26 2015-10-21 百度在线网络技术(北京)有限公司 Method and apparatus for analyzing two-dimension codes
CN107391545A (en) * 2017-05-25 2017-11-24 阿里巴巴集团控股有限公司 A kind of method classified to user, input method and device
US20190243919A1 (en) * 2018-02-06 2019-08-08 Microsoft Technology Licensing, Llc Multilevel representation learning for computer content quality
CN109165839A (en) * 2018-08-17 2019-01-08 龙马智芯(珠海横琴)科技有限公司 The processing method and processing device of data
CN109559246A (en) * 2018-10-31 2019-04-02 北京春雨天下软件有限公司 Enter group checking method, audit server, client and enters group auditing system
CN110120912A (en) * 2019-05-10 2019-08-13 腾讯科技(深圳)有限公司 Rich-media content processing method, device, readable storage medium storing program for executing and computer equipment

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
匿名: ""百家号如何提高文章内容质量?"", 《HTTPS://JINGYAN.BAIDU.COM/ARTICLE/AFD8F4DEB846F134E286E93D.HTML》 *
袁津生等: "《21世纪高等学校精品教材 搜索引擎与信息检索教程》", 30 April 2008 *
金燕等: "基于用户信誉评级的UGC质量预判模型", 《情报理论与实践》 *
锦铷说自媒体: ""内容质量分低怎么办?看完你就明白了"", 《HTTPS://WWW.SOHU.COM/A/338381133_120257639》 *
锦铷说自媒体: ""百家号内容质量分低怎么办?看完你就明白了"", 《HTTPS://BAIJIAHAO.BAIDU.COM/S?ID=1643530606964698882&WFR=SPIDER&FOR=PC》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111458471A (en) * 2019-12-19 2020-07-28 中国科学院合肥物质科学研究院 Water area detection early warning method based on graph neural network
CN113207013A (en) * 2020-02-03 2021-08-03 腾讯科技(深圳)有限公司 Multimedia data release management method, device, equipment and storage medium
CN113207013B (en) * 2020-02-03 2023-11-17 腾讯科技(深圳)有限公司 Multimedia data release management method, device, equipment and storage medium
CN112287037A (en) * 2020-10-23 2021-01-29 大连东软教育科技集团有限公司 Multi-entity mixed knowledge graph construction method and device and storage medium
CN112287037B (en) * 2020-10-23 2024-05-31 东软教育科技集团有限公司 Multi-entity mixed knowledge graph construction method, device and storage medium
WO2023115890A1 (en) * 2021-12-22 2023-06-29 郑州云海信息技术有限公司 Text quality cleaning method and apparatus, and medium

Similar Documents

Publication Publication Date Title
CN106372072B (en) A kind of recognition methods of location-based mobile agency meeting network user's relationship
CN106940732A (en) A kind of doubtful waterborne troops towards microblogging finds method
CN104182517B (en) The method and device of data processing
CN112199608B (en) Social media rumor detection method based on network information propagation graph modeling
CN105760439B (en) A kind of personage's cooccurrence relation map construction method based on specific behavior co-occurrence network
CN110377804A (en) Method for pushing, device, system and the storage medium of training course data
CN110532331A (en) A kind of method and relevant apparatus that object type is determining
CN107832724A (en) The method and device of personage's key frame is extracted from video file
CN106354818B (en) Social media-based dynamic user attribute extraction method
CN106156372B (en) A kind of classification method and device of internet site
CN109299271A (en) Training sample generation, text data, public sentiment event category method and relevant device
CN112765480B (en) Information pushing method and device and computer readable storage medium
CN108932451A (en) Audio-video frequency content analysis method and device
CN107545038B (en) Text classification method and equipment
CN102663001A (en) Automatic blog writer interest and character identifying method based on support vector machine
CN111159763B (en) System and method for analyzing portrait of law-related personnel group
CN106951471A (en) A kind of construction method of the label prediction of the development trend model based on SVM
CN106682236A (en) Machine learning based patent data processing method and processing system adopting same
CN112001739A (en) Method and system for generating user learning portrait
CN112115712B (en) Topic-based group emotion analysis method
CN106777040A (en) A kind of across media microblogging the analysis of public opinion methods based on feeling polarities perception algorithm
CN101556582A (en) System for analyzing and predicting netizen interest in forum
Wei et al. Analysis of information dissemination based on emotional and the evolution life cycle of public opinion
CN114048294B (en) Similar population extension model training method, similar population extension method and device
CN110807060A (en) Education big data analysis system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191203

RJ01 Rejection of invention patent application after publication