CN108021579B - Information output method and device - Google Patents

Information output method and device Download PDF

Info

Publication number
CN108021579B
CN108021579B CN201610960206.8A CN201610960206A CN108021579B CN 108021579 B CN108021579 B CN 108021579B CN 201610960206 A CN201610960206 A CN 201610960206A CN 108021579 B CN108021579 B CN 108021579B
Authority
CN
China
Prior art keywords
item
type
article
specified
name
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610960206.8A
Other languages
Chinese (zh)
Other versions
CN108021579A (en
Inventor
费浩峻
钱旻奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing duxiaoman Youyang Technology Co.,Ltd.
Original Assignee
Shanghai Youyang New Media Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Youyang New Media Information Technology Co ltd filed Critical Shanghai Youyang New Media Information Technology Co ltd
Priority to CN201610960206.8A priority Critical patent/CN108021579B/en
Publication of CN108021579A publication Critical patent/CN108021579A/en
Application granted granted Critical
Publication of CN108021579B publication Critical patent/CN108021579B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses an information output method and device. One embodiment of the method comprises: acquiring a type name of at least one article type and reference article information of a current reference article under each article type, and constructing an article type set according to the type name; calculating the confidence coefficient between at least one specified article and the article type corresponding to the type name in the article type set, wherein the confidence coefficient is used for representing the probability that the specified article is used as a reference article of the article type; calculating the correlation between at least one specified item and the item type corresponding to the type name in the item type set, wherein the correlation is used for representing the correlation degree between the specified item and the item type; and determining and outputting the reference article information of the reference article belonging to the article type through the confidence coefficient and the correlation. The embodiment can accurately determine the reference article information under the article type by calculating the confidence and the correlation between the specified article and the article type.

Description

Information output method and device
Technical Field
The present application relates to the field of information processing technologies, and in particular, to the field of information classification technologies, and in particular, to an information output method and apparatus.
Background
As production progresses, various types of items appear on the market, and each type of item is further subdivided into a plurality of specific items (here, the items may be physical items such as air conditioners, and virtual items such as stocks). For example, air conditioners may be classified into wall-mounted air conditioners and floor-standing air conditioners, each of which may include air conditioners of various powers, colors, volume sizes, and structures. The user can select the favorite air conditioner according to the favorite, so that the selection of the user is increased, and the personalized requirements of the user are met. Each market also separates items into different categories for selection by the user.
However, there are some disadvantages to sorting articles in the past. The type of article is usually determined by several representative articles (reference articles). However, the reference item currently determining the type of item is typically determined manually, and the reference item is important for the type classification of the item. Therefore, the current reference article selection is not objective and accurate.
Disclosure of Invention
The application provides an information output method and an information output device, which are used for solving the technical problems mentioned in the background technology.
In a first aspect, the present application provides an information output method, including: acquiring a type name of at least one article type and reference article information of a current reference article under each article type, and constructing an article type set according to the type name, wherein the reference article information comprises the number of the reference articles and the name of the reference articles; calculating the confidence coefficient between at least one designated article and the article type corresponding to the type name in the article type set, wherein the confidence coefficient is used for representing the probability that the designated article is used as a reference article of the article type; calculating the correlation between at least one specified item and the item type corresponding to the type name in the item type set, wherein the correlation is used for representing the correlation degree between the specified item and the item type; and determining and outputting the reference article information of the reference article belonging to the article type according to the confidence coefficient and the correlation.
In some embodiments, the calculating the confidence between the at least one specified item and the item type corresponding to the type name in the item type set includes: the number of times each specified item is used as a reference item of the item type is inquired; and determining the confidence degree between the item type and the specified item according to the times, the quantity of the item type and the quantity of the reference item currently contained in the item type.
In some embodiments, the calculating the correlation between the at least one specific item and the item type corresponding to the type name in the item type set includes: constructing an item type vector through a current reference item of the item type; constructing a designated item vector by the designated item; taking the number of times of appearance of the name of the specified article in the article message which takes the name of the article type as a subject in the set time as the number of times of appearance of the specified article; taking the number of times of appearance of the name of the article type in the article message with the name of the specified article as the subject in the set time as the number of times of appearance of the article type; and calculating the correlation between the specified item and the item type through the item type vector, the specified item occurrence frequency and the item type occurrence frequency.
In some embodiments, the determining and outputting the reference item information of the reference item belonging to the item type according to the confidence degree and the correlation includes: calculating the probability that the specified article becomes the reference article of the article type according to the confidence coefficient and the correlation; and selecting the designated articles corresponding to the set probabilities as reference articles of the article types according to the sequence from large to small, and outputting the reference article information of the reference articles.
In some embodiments, the building the item type set by the type name further includes: a step of aggregating article types, said step of aggregating article types comprising: constructing an item type set through the type names; the following polymerization steps are performed: clustering two item types meeting the following aggregation conditions in the item type set: the sum of the type similarity, the semantic similarity and the text similarity between the two article types is greater than a set threshold; forming a new article type set by the article types formed after polymerization and the article types which are not polymerized in the article type set; judging whether two article types meeting the aggregation condition exist in the new article type set or not, and if not, outputting the new article type set as an article type set; and if so, continuing to perform the aggregation step by taking the new item type set as an item type set.
In a second aspect, the present application provides an information output apparatus comprising: the information acquisition unit is used for acquiring the type name of at least one article type and the reference article information of the current reference article under each article type, and constructing an article type set according to the type name, wherein the reference article information comprises the number of the reference articles and the name of the reference articles; the confidence coefficient calculation unit is used for calculating the confidence coefficient between at least one specified article and the article type corresponding to the type name in the article type set, wherein the confidence coefficient is used for representing the probability that the specified article is used as a reference article of the article type; a correlation calculation unit, configured to calculate a correlation between at least one specified item and an item type corresponding to a type name in the item type set, where the correlation is used to characterize a degree of correlation between the specified item and the item type; and a reference article determining unit for determining and outputting reference article information of the reference article belonging to the article type by the confidence and the correlation.
In some embodiments, the confidence calculating unit includes: a number-of-times inquiry subunit operable to inquire the number of times that each of the specified items is a reference item of the item type; and the confidence operator unit is used for determining the confidence between the item type and the specified item according to the times, the number of the item types and the number of the reference items currently contained in the item types.
In some embodiments, the correlation calculation unit includes: the item name type vector construction subunit is used for constructing an item type vector through a current reference item of the item type; a specified item vector construction subunit for constructing a specified item vector by specifying an item; a specified item occurrence frequency determining subunit, configured to use, as a specified item occurrence frequency, a frequency of occurrence of a name of a specified item in an item message that has a name of an item type as a subject within a set time; an article name type occurrence frequency determining subunit, configured to use, as an article type occurrence frequency, a frequency of occurrence of a name of an article type in an article message that has a name of a specified article as a subject within the set time; and the correlation calculation subunit is used for calculating the correlation between the specified item and the item type through the item type vector, the specified item occurrence frequency and the item type occurrence frequency.
In some embodiments, the reference article determination unit includes: a probability calculation subunit configured to calculate, according to the confidence and the correlation, a probability that the designated item becomes a reference item of the item type; and a reference article determining subunit, configured to select, in descending order, a set of designated articles corresponding to the probabilities as reference articles of the article types, and output reference article information of the reference articles.
In some embodiments, the information acquiring unit further includes: an article type aggregation subunit for aggregating article types, the article type aggregation subunit comprising: the item type set building module is used for building an item type set through the type names; an aggregation module for performing the aggregation steps of: clustering two item types meeting the following aggregation conditions in the item type set: the sum of the type similarity, the semantic similarity and the text similarity between the two article types is greater than a set threshold; forming a new article type set by the article types formed after polymerization and the article types which are not polymerized in the article type set; judging whether two article types meeting the aggregation condition exist in the new article type set or not, and if not, outputting the new article type set as an article type set; and a repeated execution module, configured to continue to execute the aggregation step with the new item type set as an item type set when there are two item types that meet the aggregation condition.
According to the information output method, the confidence coefficient and the correlation between the specified article and the article type are calculated, and the reference article information under the article type can be accurately determined.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;
FIG. 2a is a flow diagram of one embodiment of an information output method according to the present application;
FIG. 2b is a flow chart of a process of calculating a type similarity between two item types in the information output method of FIG. 2 a;
FIG. 2c is a flow chart of a process for calculating semantic similarity between two item types in the information output method of FIG. 2 a;
fig. 3 is a schematic diagram of an application scenario of an information output method according to the present application;
FIG. 4 is a schematic block diagram of an embodiment of an information output device according to the present application;
FIG. 5 is a schematic block diagram of one embodiment of a server according to the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 shows an exemplary system architecture 100 to which embodiments of the information output method or information output apparatus of the present application may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The terminal devices 101, 102, 103 interact with a server 105 via a network 104 to receive or transmit information or the like. The terminal apparatuses 101, 102, 103 may have various information processing applications installed thereon, such as a web search application, a shopping-type application, and the like.
The terminal devices 101, 102, 103 may be various devices having data processing applications including, but not limited to, desktop computers, data servers, and the like.
The server 105 may be a server that determines the reference article information under the information sent by the terminal devices 101, 102, and 103, for example, calculates the confidence and the correlation of the information sent by the terminal devices 101, 102, and 103, and then determines the reference article information. The server 105 may obtain a set of item types from the received set of item names, then calculate the confidence and correlation between the specified item and the item type, and finally determine the reference item information under the item type.
It should be noted that the information output method provided in the embodiment of the present application is generally executed by the server 105, and accordingly, the information output apparatus is generally disposed in the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
FIG. 2a illustrates a flow chart 200 of one embodiment of an information output method comprising:
step 201, obtaining a type name of at least one item type and reference item information of a current reference item under each item type, and constructing an item type set according to the type name.
In this embodiment, the electronic device (for example, the server 105 shown in fig. 1) may receive the information sent by the terminal devices 101, 102, and 103 in a wired or wireless manner, and determine the reference article information of the information.
The server 105 first collects the item names sent by the terminal devices 101, 102, 103 to obtain an item name set, where the item names in the item name set may be: purifiers, filters, descalers, dehumidifiers, air conditioners, fans, radiators, heaters, and the like. Among them, the purifier is generally used for purifying liquid or air; filters are commonly used to remove other impurities from liquids; descalers are commonly used to remove solid or liquid scale; dehumidifiers are commonly used to remove moisture from air or objects; the air conditioner is generally used for heating or cooling air and has a certain dehumidification function; fans are generally used to accelerate air flow, and can be divided into a fan for heating and a fan for heat dissipation; heat sinks are commonly used to reduce the temperature of an object; heaters are commonly used to heat objects. The foregoing is a functional description of the various articles, which may also be described in terms of materials, sizes, colors, powers, and the like. Different descriptions may be divided into different item types. Thus, the set of item names includes item names of items under at least two item types.
The same item may be described from multiple angles, and different angles may classify the item into different types. For example, the above-mentioned purifiers can be classified into sanitary types; filters can be classified into filter types; descalers can be classified into the decontamination type; the dehumidifier can be classified into a damp-clearing type; air conditioners can be classified into temperature control types; fans can be classified into cooling types; heat sinks can be classified into heat dissipation types; the heater may be classified into heating types. At this time, the obtained item type set corresponding to the item name set includes: hygiene type, screening type, decontamination type, damp clearing type, temperature control type, cooling type, heat dissipation type and heating type. The above-mentioned articles can also be classified into other types in terms of materials and the like, and the details are not repeated here. And then construct a collection of item types by type name.
Each article type has respective reference article, and after the article type is determined, the current reference article information of the article type can be obtained. The reference article information includes the number of reference articles and the name of each reference article, the reference articles are used for determining the type of the articles, and the reference articles can be different at different time.
In some optional implementations of this embodiment, the building the item type set by the type name may further include: a step of aggregating the article types, said step of aggregating the article types comprising the steps of:
in the first step, an item type set is constructed by the type names.
After the item type corresponding to each item name is determined, the names of the item types are combined to form an item type set.
A second step of performing the following polymerization steps: clustering two item types meeting the following aggregation conditions in the item type set: the sum of the type similarity, the semantic similarity and the text similarity between the two article types is greater than a set threshold; forming a new article type set by the article types formed after polymerization and the article types which are not polymerized in the article type set; judging whether two article types meeting the aggregation condition exist in the new article type set or not, and if not, outputting the new article type set as an article type set;
in order to accurately divide the types of the articles, the types of the articles can be accurately judged according to the type similarity, the semantic similarity and the text similarity among the types of the articles. The type similarity is obtained through vector calculation of the item name types, the semantic similarity is obtained through calculation of the number of the item name types appearing in the item message, and the text similarity is obtained through calculation of the same characters and different characters in the names of the item name types. If the sum of the type similarity, the semantic similarity and the text similarity between the two article types is larger than a set threshold, the two article types can be considered to be classified into one type, otherwise, the two article types can not be classified into one type. When aggregation into one type is possible, the type name after aggregation may be one of the type names before aggregation, or may be another type name. And the aggregated article types are put into the article type set again to form a new article type set, and if two article types capable of being aggregated do not exist in the new article type set at the moment, the aggregation is finished, and the new article type set can be output as the article type set.
In some optional implementations of this embodiment, a flowchart of a process of calculating a type similarity between two article types is shown in fig. 2b, and includes the following steps:
in step 20111, a corresponding reference item vector is set for each reference item included in the item type, and an item type vector of the item type is constructed according to the reference item vector.
Wherein the reference item is used to determine the type of item. For example, reference items of hygiene type may be soap, toothbrush, shampoo and detergent, etc. And respectively setting a reference article vector according to the attribute of each reference article. For example, the attributes of a soap may include sterilization, decontamination, oil removal, water solubility, etc., and the reference item vector for a soap includes: sterilizing, removing dirt, removing oil, and water-soluble. Thus, combining the soap reference item vector, the toothbrush reference item vector, the shampoo reference item vector, and the detergent reference item vector constitutes a hygiene-type item type vector. It should be noted that each reference item vector should contain the same number of attributes. Each attribute is assigned a vector, and the reference item vector is the vector sum of the attributes.
In step 20112, the cosine similarity between the two above-mentioned object type vectors is calculated.
The cosine similarity is used for judging the similarity of the two article type vectors through the cosine value of the vector included angle. The number of attributes included in the reference item vector should be the same, and the reference item vectors included in the item type vector may be the same or different. The difference is that the more reference item vectors, the more the trend of the item type vector is affected, and the more the angle between the two item type vectors is affected.
In step 20113, the type similarity is determined according to the cosine similarity.
The greater the cosine similarity between two item type vectors, the greater the similarity of the two item types. Here, a threshold may be set for the cosine similarity, and when the cosine similarity is greater than the threshold, the type similarity is 1, which indicates that the two article types are similar, otherwise, the type similarity is 0, which indicates that the two article types are not similar. It is also possible to directly output the value of the cosine similarity as the type similarity.
In some optional implementation manners of this embodiment, clustering two item types that meet the following aggregation condition in the item type set may further include: determining an inclusion relationship between the two article types, wherein the inclusion relationship is used for representing whether a reference article under one article type is completely contained in the other article type, and determining the type similarity according to the cosine similarity and the inclusion relationship.
When whether the reference article in one article type is completely contained in the other article type exists between the two article types, the two article types can be considered to be certainly similar, at this time, the value of the containing relationship is 1, and otherwise, the value is 0.
In some optional implementations of this embodiment, a flowchart of a calculation process of semantic similarity between two item types is shown in fig. 2c, and may include the following steps:
step 20121, at least one item message in a set time period is obtained.
The article message here means information such as a newspaper and an article message related to the article, and reflects the latest state of the article. The items can be divided into different types according to different standards, and when several item types appear in the item message at the same time, the item types can be explained to have relevance to a certain extent.
Step 20122, determining the number of the article messages which are simultaneously subject to the two article types to obtain the simultaneous occurrence number.
The article information in a period of time is usually many, and the simultaneous occurrence number can be determined by finding out the article information which takes the two article types as the subjects and simultaneously occurs from the article information.
Step 20123, determining the number of the item messages with the two item types as the subjects in the item messages to obtain a first occurrence number and a second occurrence number.
And finding out the item message which takes only one of the two item types as a subject from the item messages, and determining the first occurrence number and the second occurrence number.
Step 20124, the ratio of the number of simultaneous occurrences to the product of the first number of occurrences and the second number of occurrences is used as the semantic similarity.
In some optional implementations of this embodiment, clustering two item types meeting the following aggregation condition in the item type set may include the following steps:
the first step is to determine the same number of characters and different number of characters of the type names of the two article types.
For example, the first item type is a detergent type name, the second item type is a detergent type name, both type names have "soil release", and there are 4 different words in both type names, namely "go", "soil", "powder" and "agent". The same number of letters is 2 and the different number of letters is 4.
And secondly, taking the ratio of the same number of the characters to the different number of the characters as the text similarity.
In some optional implementations of this embodiment, clustering two item types in the item type set that meet the following aggregation condition includes: setting weights for the type similarity, the semantic similarity and the text similarity respectively, and aggregating the two article types into one article type when the sum of the product of the type similarity, the semantic similarity and the text similarity and the respective weight is greater than a set threshold.
According to different article types, different weights can be set for the type similarity, the semantic similarity and the text similarity respectively, the weights are multiplied by the similarity values and then added, if the value at the moment is larger than a set threshold value, the two article types can be clustered, otherwise, the two article types can not be clustered.
And thirdly, if the new item type set exists, continuing to perform the aggregation step by taking the new item type set as an item type set.
If there are two item types capable of being aggregated in the new item type set, the above aggregation process is repeated with the new item type set as the item type set until there are no two item types capable of being aggregated.
Step 202, calculating a confidence between at least one specified item and an item type corresponding to the type name in the item type set.
The designated article may be a reference article in the article type or may be another non-reference article. To determine whether a given item can be a reference item under the item type, a confidence level between the given item and the item type needs to be calculated. The confidence level is used to characterize the probability that the designated item is the reference item for the item type.
In some optional implementations of this embodiment, the calculating the confidence between the at least one specified item and the item type corresponding to the type name in the item type set may include:
in the first step, the number of times each specified item is used as a reference item for the item type is queried.
If the designated item is previously taken as a reference item for the item type, the number of times the designated item is taken as the reference item is recorded.
And secondly, determining the confidence between the item type and the specified item according to the times, the quantity of the item type and the quantity of the reference item currently contained in the item type.
Determining the confidence between the specified item and the item type by taking the number of times the specified item is taken as a reference item, the number of the item types and the number of reference items of which the item types are current. The higher the confidence, the greater the likelihood that the designated item will become the reference item for the item type.
Step 203, calculating the correlation between at least one specified item and the item type corresponding to the type name in the item type set.
The correlation is used for representing the degree of correlation between the specified item and the item type and is determined by the number of the names of the item type and the number of the names of the specified item in the item message within a set time.
In some optional implementations of this embodiment, the calculating a correlation between the at least one specific item and the item type corresponding to the type name in the item type set may include:
in a first step, an item type vector is constructed from a current reference item for the item type.
The reference item is used to determine the type to which the item belongs. For example, reference items of hygiene type may be soap, toothbrush, shampoo and detergent, etc. And respectively setting a reference article vector according to the attribute of each reference article. For example, the attributes of a soap may include sterilization, decontamination, oil removal, water solubility, etc., and the reference item vector for a soap includes: sterilizing, removing dirt, removing oil, and water-soluble. Thus, combining the soap reference item vector, the toothbrush reference item vector, the shampoo reference item vector, and the detergent reference item vector constitutes a hygiene-type item type vector. It should be noted that each reference item vector should contain the same number of attributes. Each attribute is assigned a vector, and the reference item vector is the vector sum of the attributes.
In the second step, a specified item vector is constructed by specifying items.
Similar to constructing the reference item vector, the specified item vector may also be constructed by specifying attributes of the item.
And thirdly, taking the frequency of the appearance of the name of the specified article in the article message which takes the name of the article type as the subject in the set time as the frequency of the appearance of the specified article.
The article message here means information such as a newspaper and an article message related to the article, and reflects the latest state of the article. The items can be divided into different types according to different standards, and when several item types appear in the item message at the same time, the item types can be explained to have relevance to a certain extent. The number of occurrences of the name of the specified item in the item message that is subject to the item type is determined from the item messages.
And fourthly, taking the frequency of the appearance of the name of the article type in the article message which takes the name of the specified article as the subject in the set time as the frequency of the appearance of the article type.
Similarly, the number of occurrences of the name of the item type described above in the item message that has the name of the specified item as the subject is determined.
And fifthly, calculating the correlation between the specified article and the article type according to the article type vector, the specified article occurrence frequency and the article type occurrence frequency.
These vectors and parameters are substituted into a correlation formula to yield a correlation value that specifies the item and the item type.
And step 204, determining and outputting the reference article information of the reference article belonging to the article type according to the confidence coefficient and the correlation.
Here, the confidence and the correlation are obtained based on the same article type, that is, the reference article information is determined by the confidence obtained by specifying the article and the article type a and the correlation obtained by specifying the article and the article type a.
In some optional implementations of this embodiment, the determining and outputting the reference item information of the reference item belonging to the item type according to the confidence and the correlation may include:
first, a probability that the designated item becomes a reference item of the item type is calculated based on the confidence level and the correlation.
The probability that the designated item becomes the benchmark item for the item type is determined by the product of the confidence and the correlation.
And secondly, selecting the designated articles corresponding to the set probabilities as reference articles of the article types according to the descending order, and outputting the reference article information of the reference articles.
In the case of a representative article in the article type in the reference article, a designated article corresponding to several values having the highest probability is usually selected as the reference article for the article type, and the reference article information of the reference article can be output.
With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the information output method according to the present embodiment. In the scenario of fig. 3, the acquired item name set includes: purifiers, filters, descalers, dehumidifiers, air conditioners, fans, radiators and heaters. The classification of each item name in the existing market corresponds to: the method comprises the following steps of obtaining an article type set by a hygiene type, a screening type, a decontamination type, a dampness clearing type, a temperature control type, a cooling type, a heat dissipation type and a heating type. By comparing the type similarity, the semantic similarity and the text similarity of two article types in the article type set, whether the two article types can be aggregated is judged, specifically:
(1) similarity of type
When calculating the type similarity, a reference article vector needs to be constructed by a reference article of the article type, and then an article type vector of the article type is constructed:
vec(type)={T1,T2,…Ti…Tn}
wherein, the type is an article type; vec (type) is an item type vector; t isiIs a reference item vector; i is the number of reference articles, i is a natural number; i is 1,2, … n.
The formula for calculating the type similarity is as follows:
rel(typej,typek)=α1×cos(vec(typej),vec(typek))+α2×include(vec(typej),vec(typek))
wherein, typejIs the jth item type; typekIs the kth item type; rel (type)j,typek) Is typejAnd typekType similarity of (2); vec (type)j) An item type vector for a jth item type; vec (type)k) An item type vector for a kth item type; cos (vec (type)j),vec(typek) Vec (type)j) And vec (type)k) Cosine similarity of (d); include (vec (type)j),vec(typek) Vec (type)j) And vec (type)k) Contains a relationship value, typejAnd typekWhen the reference article(s) of (2) has an inclusion relationship, include (vec)j),vec(typek) 1, otherwise, include (type)j),vec(typek))=0;α1And alpha2Are respectively a first weight and a second weight, alpha12=1。
(2) Semantic similarity
When calculating the semantic similarity, it is necessary to obtain the item messages within a period of time (for example, within one month), then determine the number of the item messages that are subject to two item types at the same time to obtain the simultaneous occurrence number, and obtain the first occurrence number and the second occurrence number respectively from the number of the item messages that are subject to two item types, and use the ratio of the simultaneous occurrence number to the product of the first occurrence number and the second occurrence number as the semantic similarity.
(3) Text similarity
Determining the same quantity of characters and different quantities of characters of the type names of the two article types, and taking the ratio of the same quantity of characters to the different quantities of characters as the text similarity.
Aggregating the sanitation type, the screening type and the decontamination type into a purification type according to the analysis of the type similarity, the semantic similarity and the text similarity; the damp-clearing type cannot be clustered with other types; the temperature control type and the cooling type are aggregated into a temperature control type; the heat dissipation type and the heating type are aggregated into a heat conduction type, so that the clustering of the article types is completed.
Confidence and relevance between the specified item and the item type are then calculated, the specified item comprising: purifier, perfumed soap, washing powder, detergent, hygienic ball, humidifier, dehumidifier, dryer, heater, thermos, electric heating fan, electric blanket, refrigerator, freezer, heating pipe, heater, radiator, heat conducting strip, essential balm, etc. Calculating the confidence coefficient and the correlation of the specified article and the article type to obtain the reference article under each article type as follows: the benchmark items under decontamination type include: purifiers, soap, detergent; reference items under the damp-repellent type include: a dehumidifier and dryer; the reference article in the temperature control type includes a heater and a refrigerator; the reference article in the heat conduction type includes a heat sink and a heater, i.e., reference article information of the reference article is determined. The other specified article cannot be the reference article.
According to the information output method, the confidence coefficient and the correlation between the specified article and the article type are calculated, and the reference article information under the article type can be accurately determined.
With further reference to fig. 4, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of an information output apparatus, which corresponds to the embodiment of the method shown in fig. 2, and which is particularly applicable to various electronic devices.
As shown in fig. 4, the information determination apparatus 400 of the present embodiment may include: an information acquisition unit 401, a confidence calculation unit 402, a correlation calculation unit 403, and a reference article determination unit 404. The information obtaining unit 401 is configured to obtain a type name of at least one item type and reference item information of a current reference item in each item type, and construct an item type set according to the type name, where the reference item information includes the number of the reference items and the name of the reference item; the confidence coefficient calculation unit 402 is configured to calculate a confidence coefficient between at least one specified item and an item type corresponding to the type name in the item type set, where the confidence coefficient is used to characterize a probability that the specified item is a reference item of the item type; the correlation calculation unit 403 is configured to calculate a correlation between at least one specified item and an item type corresponding to a type name in the item type set, where the correlation is used to characterize a degree of correlation between the specified item and the item type; the reference item determining unit 404 is configured to determine and output reference item information of a reference item belonging to the item type by the above-described confidence degree and correlation.
In some optional implementations of the present embodiment, the confidence calculating unit 402 includes: a number query subunit (not shown) and a confidence meter subunit (not shown). The number inquiry subunit is used for inquiring the number of times that each specified article is used as a reference article of the article type; and the confidence operator unit is used for determining the confidence between the item type and the specified item according to the times, the number of the item types and the number of the reference items currently contained in the item types.
In some optional implementations of this embodiment, the correlation calculation unit 303 includes: an item name type vector construction subunit (not shown in the figure), a specified item occurrence number determination subunit (not shown in the figure), an item name type occurrence number determination subunit (not shown in the figure), and a correlation calculation unit calculation subunit (not shown in the figure). The item name type vector construction subunit is used for constructing an item type vector through a current reference item of the item type; the specified item vector construction subunit is used for constructing a specified item vector through the specified item; the specified article occurrence frequency determining subunit is used for taking the frequency of occurrence of the name of the specified article in the article message which takes the name of the article type as the subject within the set time as the frequency of occurrence of the specified article; the article name type occurrence frequency determining subunit is used for taking the frequency of occurrence of the name of the article type in the article message which takes the name of the specified article as the subject within the set time as the occurrence frequency of the article type; and the correlation calculation subunit is used for calculating the correlation between the specified item and the item type through the item type vector, the specified item occurrence frequency and the item type occurrence frequency.
In some optional implementations of the present embodiment, the reference item determining unit 304 includes: a probability calculation subunit (not shown in the figure) and a reference article determination subunit (not shown in the figure). Wherein, the probability calculation subunit is used for calculating the probability that the specified article becomes the reference article of the article type according to the confidence coefficient and the correlation; the reference article determining subunit is configured to select, in descending order, a set of designated articles corresponding to the probabilities as reference articles of the article types, and output reference article information of the reference articles.
In some optional implementation manners of this embodiment, the information obtaining unit 301 further includes: an article type aggregation subunit (not shown) for aggregating article types, said article type aggregation subunit comprising: an item type set building module (not shown), an aggregation module (not shown), and a repeat execution module (not shown). The item type set building module is used for building an item type set through the type names; the aggregation module is used for executing the following aggregation steps: clustering two item types meeting the following aggregation conditions in the item type set: the sum of the type similarity, the semantic similarity and the text similarity between the two article types is greater than a set threshold; forming a new article type set by the article types formed after polymerization and the article types which are not polymerized in the article type set; judging whether two article types meeting the aggregation condition exist in the new article type set or not, and if not, outputting the new article type set as an article type set; and the repeated execution module is used for taking the new item type set as an item type set to continue to execute the aggregation step when two item types meeting the aggregation condition exist.
Referring now to FIG. 5, a block diagram of a computer system 500 suitable for use in implementing a server according to embodiments of the present application is shown.
As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU)501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM503, various programs and data necessary for the operation of the system 500 are also stored. The CPU501, ROM502, and RAM503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
The following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a display such as a Liquid Crystal Display (LCD) and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an information acquisition unit, a confidence calculation unit, a correlation calculation unit, and a reference item determination unit. Where the names of these units do not in some cases constitute a limitation on the unit itself, for example, the reference item determination unit may also be described as a "unit for determining reference item information".
As another aspect, the present application also provides a non-volatile computer storage medium, which may be the non-volatile computer storage medium included in the apparatus in the above embodiment; or it may be a non-volatile computer storage medium that exists separately and is not incorporated into the terminal. The non-volatile computer storage medium stores one or more programs that, when executed by a device, cause the device to: acquiring a type name of at least one article type and reference article information of a current reference article under each article type, and constructing an article type set according to the type name, wherein the reference article information comprises the number of the reference articles and the name of the reference articles; calculating the confidence coefficient between at least one designated article and the article type corresponding to the type name in the article type set, wherein the confidence coefficient is used for representing the probability that the designated article is used as a reference article of the article type; calculating the correlation between at least one specified item and the item type corresponding to the type name in the item type set, wherein the correlation is used for representing the correlation degree between the specified item and the item type; and determining and outputting the reference article information of the reference article belonging to the article type according to the confidence coefficient and the correlation.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (10)

1. An information output method, characterized in that the method comprises:
acquiring a type name of at least one article type and reference article information of a current reference article under each article type, and constructing an article type set according to the type name, wherein the reference article information comprises the number of the reference articles and the name of the reference articles; the at least one article type is divided based on type similarity, semantic similarity and text similarity among the article types;
calculating the confidence coefficient between at least one designated item and the item type corresponding to the type name in the item type set, wherein the confidence coefficient is used for representing the probability that the designated item is used as a reference item of the item type;
calculating the correlation between at least one specified item and the item type corresponding to the type name in the item type set, wherein the correlation is used for representing the correlation degree between the specified item and the item type;
and determining and outputting reference article information of the reference article belonging to the article type according to the confidence coefficient and the correlation.
2. The method of claim 1, wherein the calculating a confidence level between the at least one specified item and the item type corresponding to the type name in the set of item types comprises:
the number of times each specified item is used as a reference item of the item type is inquired;
and determining the confidence degree between the item type and the specified item according to the times, the quantity of the item type and the quantity of the reference item currently contained in the item type.
3. The method of claim 1, wherein the calculating the correlation between the at least one specified item and the item type corresponding to the type name in the set of item types comprises:
constructing an item type vector through a current reference item of the item type;
constructing a designated item vector by the designated item;
taking the number of times of appearance of the name of the specified article in the article message which takes the name of the article type as a subject in the set time as the number of times of appearance of the specified article;
taking the frequency of occurrence of the name of the article type in the article message with the name of the specified article as the subject in the set time as the frequency of occurrence of the article type;
and calculating the correlation between the specified item and the item type through the item type vector, the specified item occurrence times and the item type occurrence times.
4. The method according to claim 1, wherein said determining and outputting, by said confidence and relevance, reference item information for a reference item belonging to an item type comprises:
calculating the probability that the specified article becomes a reference article of the article type according to the confidence coefficient and the correlation;
and selecting specified articles corresponding to the set probabilities as reference articles of the article types according to the sequence from large to small, and outputting reference article information of the reference articles.
5. The method of claim 1, wherein said building a set of item types by said type name further comprises: a step of aggregating item types, the step of aggregating item types comprising:
building a set of item types by the type names;
the following polymerization steps are performed: clustering two item types meeting the following aggregation conditions in the item type set: the sum of the type similarity, the semantic similarity and the text similarity between the two article types is greater than a set threshold; forming a new article type set by the article types formed after the polymerization and the article types which are not polymerized in the article type set; judging whether two article types meeting the aggregation condition exist in the new article type set or not, and if not, outputting the new article type set as an article type set;
if so, continuing to perform the aggregating step with the new item type set as an item type set.
6. An information output apparatus, characterized in that the apparatus comprises:
the information acquisition unit is used for acquiring the type name of at least one article type and the reference article information of the current reference article under each article type, and constructing an article type set according to the type name, wherein the reference article information comprises the number of the reference articles and the name of the reference articles; the at least one article type is divided based on type similarity, semantic similarity and text similarity among the article types;
the confidence coefficient calculation unit is used for calculating the confidence coefficient between at least one specified article and the article type corresponding to the type name in the article type set, wherein the confidence coefficient is used for representing the probability that the specified article is used as a reference article of the article type;
a correlation calculation unit, configured to calculate a correlation between at least one specified item and an item type corresponding to a type name in the item type set, where the correlation is used to characterize a degree of correlation between the specified item and the item type;
and the reference article determining unit is used for determining and outputting reference article information of the reference article belonging to the article type according to the confidence coefficient and the correlation.
7. The apparatus of claim 6, wherein the confidence computation unit comprises:
a number-of-times inquiry subunit operable to inquire the number of times that each of the specified items is a reference item of the item type;
and the confidence operator unit is used for determining the confidence between the item type and the specified item according to the times, the quantity of the item type and the quantity of the reference item currently contained in the item type.
8. The apparatus according to claim 6, wherein the correlation calculation unit comprises:
the item name type vector construction subunit is used for constructing an item type vector through a current reference item of the item type;
a specified item vector construction subunit for constructing a specified item vector by specifying an item;
a specified item occurrence frequency determining subunit, configured to use, as a specified item occurrence frequency, a frequency of occurrence of a name of a specified item in an item message that has a name of an item type as a subject within a set time;
an item name type occurrence frequency determining subunit, configured to use, as the item type occurrence frequency, a frequency of occurrence of a name of an item type in an item message that takes a name of a specified item as a subject within the set time;
and the correlation calculation subunit is used for calculating the correlation between the specified item and the item type through the item type vector, the specified item occurrence frequency and the item type occurrence frequency.
9. The apparatus according to claim 6, wherein the reference article determination unit comprises:
a probability calculating subunit, configured to calculate, according to the confidence and the correlation, a probability that the specified item becomes a reference item of the item type;
and the reference article determining subunit is used for selecting the designated articles corresponding to the set probabilities as reference articles of the article types according to the sequence from large to small, and outputting reference article information of the reference articles.
10. The apparatus according to claim 6, wherein the information obtaining unit further comprises: an article type aggregation subunit for aggregating article types, the article type aggregation subunit comprising:
the item type set building module is used for building an item type set through the type name;
an aggregation module for performing the aggregation steps of: clustering two item types meeting the following aggregation conditions in the item type set: the sum of the type similarity, the semantic similarity and the text similarity between the two article types is greater than a set threshold; forming a new article type set by the article types formed after the polymerization and the article types which are not polymerized in the article type set; judging whether two article types meeting the aggregation condition exist in the new article type set or not, and if not, outputting the new article type set as an article type set;
and the repeated execution module is used for taking the new item type set as an item type set to continue executing the aggregation step when two item types meeting the aggregation condition exist.
CN201610960206.8A 2016-10-28 2016-10-28 Information output method and device Active CN108021579B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610960206.8A CN108021579B (en) 2016-10-28 2016-10-28 Information output method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610960206.8A CN108021579B (en) 2016-10-28 2016-10-28 Information output method and device

Publications (2)

Publication Number Publication Date
CN108021579A CN108021579A (en) 2018-05-11
CN108021579B true CN108021579B (en) 2021-10-15

Family

ID=62084102

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610960206.8A Active CN108021579B (en) 2016-10-28 2016-10-28 Information output method and device

Country Status (1)

Country Link
CN (1) CN108021579B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002352181A (en) * 2001-05-30 2002-12-06 Misawa Homes Co Ltd Form, system and method for ordering product, computer program, and storage medium
JP2007334506A (en) * 2006-06-13 2007-12-27 Toyota Motor Corp Apparatus for presenting recommended component type
CN103778205A (en) * 2014-01-13 2014-05-07 北京奇虎科技有限公司 Commodity classifying method and system based on mutual information
CN104484461A (en) * 2014-12-29 2015-04-01 北京奇虎科技有限公司 Method and system based on encyclopedia data for classifying entities
CN104504086A (en) * 2014-12-25 2015-04-08 北京国双科技有限公司 Clustering method and device for webpage
CN104715014A (en) * 2015-01-26 2015-06-17 中山大学 Online news topic detection method
CN105608166A (en) * 2015-12-18 2016-05-25 Tcl集团股份有限公司 Label extracting method and device
CN105740380A (en) * 2016-01-27 2016-07-06 北京邮电大学 Data fusion method and system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002352181A (en) * 2001-05-30 2002-12-06 Misawa Homes Co Ltd Form, system and method for ordering product, computer program, and storage medium
JP2007334506A (en) * 2006-06-13 2007-12-27 Toyota Motor Corp Apparatus for presenting recommended component type
CN103778205A (en) * 2014-01-13 2014-05-07 北京奇虎科技有限公司 Commodity classifying method and system based on mutual information
CN104504086A (en) * 2014-12-25 2015-04-08 北京国双科技有限公司 Clustering method and device for webpage
CN104484461A (en) * 2014-12-29 2015-04-01 北京奇虎科技有限公司 Method and system based on encyclopedia data for classifying entities
CN104715014A (en) * 2015-01-26 2015-06-17 中山大学 Online news topic detection method
CN105608166A (en) * 2015-12-18 2016-05-25 Tcl集团股份有限公司 Label extracting method and device
CN105740380A (en) * 2016-01-27 2016-07-06 北京邮电大学 Data fusion method and system

Also Published As

Publication number Publication date
CN108021579A (en) 2018-05-11

Similar Documents

Publication Publication Date Title
CN110135915B (en) Commodity recommendation method and system
Haq et al. An efficient adaptive EWMA control chart for monitoring the process mean
JP6578244B2 (en) Determining suitability accuracy based on historical data
CN104915734B (en) Commodity popularity prediction method based on time sequence and system thereof
Zhang et al. Applications and comparisons of four time series models in epidemiological surveillance data
CN110060090A (en) Method, apparatus, electronic equipment and the readable storage medium storing program for executing of Recommendations combination
CN109389442A (en) Method of Commodity Recommendation and device, storage medium and electric terminal
JP5493597B2 (en) Search method and search system
WO2017001928A1 (en) Advanced identification and classification of sensors and other points in a building automation system
CN110020877B (en) Click rate prediction method, click rate determination method and server
JP2005135071A (en) Method and device for calculating trust values on purchase
JP6551101B2 (en) INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM
CN107093091B (en) Data processing method and device
CN112288517A (en) Commodity recommendation method and device combining RPA and AI
CN108874813B (en) Information processing method, device and storage medium
TWI634499B (en) Data analysis method, system and non-transitory computer readable medium
CN108021579B (en) Information output method and device
CN108009867B (en) Information output method and device
CN108009178B (en) Information aggregation method and device
JPWO2018078761A1 (en) Clustering system, method and program, and recommendation system
JP6541737B2 (en) Selection apparatus, selection method, selection program, model and learning data
CN114915514B (en) Method and device for processing intention, storage medium and electronic device
CN114676400A (en) Identity determination method, storage medium and electronic device
Grinberg et al. State Sequence Analysis in Hidden Markov Models.
JP2014081090A (en) System for storing sensor information trend and improving scaling efficiency

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20191127

Address after: 201210 room j1328, floor 3, building 8, No. 55, Huiyuan Road, Jiading District, Shanghai

Applicant after: SHANGHAI YOUYANG NEW MEDIA INFORMATION TECHNOLOGY Co.,Ltd.

Address before: 100085 Beijing, Haidian District, No. ten on the ground floor, No. 10 Baidu building, layer three

Applicant before: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY Co.,Ltd.

EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20180511

Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY Co.,Ltd.

Assignor: SHANGHAI YOUYANG NEW MEDIA INFORMATION TECHNOLOGY Co.,Ltd.

Contract record no.: X2020990000202

Denomination of invention: Information output method and device

License type: Exclusive License

Record date: 20200420

GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 401120 b7-7-2, Yuxing Plaza, No.5, Huangyang Road, Yubei District, Chongqing

Patentee after: Chongqing duxiaoman Youyang Technology Co.,Ltd.

Address before: 201210 room j1328, 3 / F, building 8, 55 Huiyuan Road, Jiading District, Shanghai

Patentee before: SHANGHAI YOUYANG NEW MEDIA INFORMATION TECHNOLOGY Co.,Ltd.