WO2021128768A1 - 基于自然语言的体育新闻写作方法、装置及电子设备 - Google Patents

基于自然语言的体育新闻写作方法、装置及电子设备 Download PDF

Info

Publication number
WO2021128768A1
WO2021128768A1 PCT/CN2020/097005 CN2020097005W WO2021128768A1 WO 2021128768 A1 WO2021128768 A1 WO 2021128768A1 CN 2020097005 W CN2020097005 W CN 2020097005W WO 2021128768 A1 WO2021128768 A1 WO 2021128768A1
Authority
WO
WIPO (PCT)
Prior art keywords
event
slot
slots
template
events
Prior art date
Application number
PCT/CN2020/097005
Other languages
English (en)
French (fr)
Inventor
周金娟
沈艺
倪合强
齐康
梁诗雯
Original Assignee
苏宁易购集团股份有限公司
苏宁云计算有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏宁易购集团股份有限公司, 苏宁云计算有限公司 filed Critical 苏宁易购集团股份有限公司
Priority to CA3165616A priority Critical patent/CA3165616A1/en
Publication of WO2021128768A1 publication Critical patent/WO2021128768A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates

Definitions

  • the invention belongs to the technical field of natural language processing, and specifically relates to a natural language-based sports news writing method, device and electronic equipment.
  • the existing template traversal matching strategy needs to compare the slots and the number of slots in the template one by one when matching the event data with the template, until a suitable template is found.
  • a piece of data for a goal event is: ⁇ ORG_NEU: Newcastle United, PER_ACT: Scheer, EVEINF_LOC_FROM: center outside the penalty area, EVEINF_BODY: right foot, EVEINF_LOC_TO: upper right corner of the goal ⁇ , this data contains five slots, when matched
  • this data contains five slots, when matched
  • the goal event has the following partial templates:
  • the traversal strategy needs to calculate the slot information of the template each time, and then it is time-consuming to perform a set operation with the slot information of the data. Because the online system has performance requirements, it often returns when the match is successful, and will not match all template. This will bring another problem.
  • the first qualified template that is successfully matched is not the optimal template, that is, the slot type and the number of slots do not meet the maximization requirements.
  • the above can be successfully matched when the first template is matched, but the use of the first template will cause the EVEINF_LOC_FROM and EVEINF_BODY information to not be filled, and the amount of article information generated is less. Therefore, the traversal matching strategy is not only inefficient, but the diversity of matching results is also poor.
  • One of the purposes of this application is to provide a natural language-based sports news writing method based on the shortcomings of the prior art to enhance the diversity of article sentence patterns and maximize the amount of article information.
  • the method includes the steps:
  • the news content is reprocessed to obtain the final news content.
  • the obtaining the event set, the slot and the slot value corresponding to each of the slots includes the steps:
  • the method further includes the following steps:
  • the event includes a title, an abstract and a text.
  • the weight assignment for each of the events includes the steps:
  • the weight of the event corresponding to the mapping is set.
  • the encoding each of the events and the type and number of the slots in the event template includes the steps:
  • each slot needs to be allocated n binary bits for representation; where n is a submultiple of 64;
  • the screening of the event includes the steps:
  • the screening of the event template includes the steps:
  • One of the event templates is randomly selected from the candidate event templates as the event template to be filled.
  • the second purpose of this application is to provide a natural language-based sports news writing device based on the shortcomings of the prior art to enhance the diversity of article sentence patterns and maximize the amount of article information.
  • the device includes:
  • the acquiring unit is used to acquire the to-be-processed corpus, the event set, the slot, and the slot value corresponding to each of the slots;
  • An event template tagging unit configured to tag an event template in the corpus according to each event in the event set, the slot, and the value of the slot;
  • the weight assignment unit is used for weight assignment for each event
  • An encoding unit for encoding each of the events and the type and number of the slots in the event template
  • a screening unit configured to screen the event and the event template according to the weight of each event
  • the news content generating unit is used to match and fill the selected event and the event template to generate news content
  • the news content processing unit is used to reprocess the news content to obtain the final news content.
  • the third purpose of this application is to provide an electronic device in view of the shortcomings of the prior art to improve the diversity of sentence patterns and maximize the amount of information in the article.
  • the electronic device includes:
  • At least one processor and,
  • a memory communicatively connected with the at least one processor; wherein,
  • the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute any of the aforementioned sports news writing methods.
  • the natural language-based sports news writing method, device and electronic equipment provided in this application can automatically generate news content based on event templates extracted after analyzing a large number of sports news in advance, and the user's own weight assignment to the event, which improves on the one hand
  • the diversity of sentence patterns in the article maximizes the amount of information in the article; on the other hand, it realizes efficient automatic writing of sports news articles, reducing labor costs.
  • Figure 1 is a flowchart of Embodiment 1 of a method for writing sports news based on natural language provided by the present invention
  • Embodiment 2 is a flowchart of Embodiment 2 of a method for writing sports news based on natural language provided by the present invention
  • Embodiment 3 is a flowchart of Embodiment 3 of a method for writing sports news based on natural language provided by the present invention
  • Embodiment 4 is a flowchart of Embodiment 4 of a method for writing sports news based on natural language provided by the present invention
  • Embodiment 5 is a flowchart of Embodiment 5 of a method for writing sports news based on natural language provided by the present invention
  • Embodiment 6 is a flowchart of Embodiment 6 of a method for writing sports news based on natural language provided by the present invention
  • FIG. 7 is a flowchart of Embodiment 7 of a method for writing sports news based on natural language provided by the present invention.
  • Fig. 8 is a schematic structural diagram of a sports news writing device based on natural language provided by the present invention.
  • Fig. 9 is a schematic structural diagram of an electronic device provided by the present invention.
  • the embodiments of the present disclosure provide a method for writing sports news based on natural language.
  • the natural language-based sports news writing method provided in this embodiment can be executed by a computing system.
  • the computing system can be implemented as software, or as a combination of software and hardware.
  • the computing system can be integrated on a server, terminal device, etc. in.
  • this application provides a method for writing sports news based on natural language.
  • the method includes the following steps:
  • Step S101 Obtain the corpus to be processed, the event set, the slot and the slot value corresponding to each slot.
  • the corpus to be processed there are many ways to obtain the corpus to be processed, the event set, the slot, and the slot value corresponding to each slot.
  • the slot and the slot value corresponding to each slot can also be automatically obtained through the device to automatically obtain the preset corpus to be processed, the event set, the slot and the slot value corresponding to each slot, and the server can also automatically grab it (For example, using crawlers to crawl) preset corpus to be processed, event collection, slot and slot value corresponding to each slot, this application does not limit this method.
  • Step S102 According to the value of each event, slot and slot in the event set, an event template is marked in the corpus.
  • the to-be-processed corpus obtained in step S101 can be labeled with event templates.
  • Step S103 Perform weight assignment for each event.
  • weights to each event there are many ways to assign weights to each event. For example, you can manually input the weight of each event to the device, or you can automatically obtain the weight of each event through the device, or you can use the server
  • the weight corresponding to each event is automatically captured (for example, crawling by using a crawler), and this method is not limited in this application.
  • Step S104 Encode each event and the type and number of slots in the event template.
  • each event and the type and number of slots in the event template there are many ways to encode each event and the type and number of slots in the event template. For example, you can manually input the type and number of each event and the slot in the event template into the device.
  • the code of each event and the number of slots in the event template can also be automatically obtained through the device, and the code of each event and the slot in the event template can also be automatically captured by the server (for example, using crawlers to crawl)
  • the type and number of codes are not limited in this application.
  • the encoding type used can be binary, octal or decimal, etc.
  • the application does not limit the specific encoding type.
  • Step S105 screening events and event templates according to the weight of each event.
  • events and event templates are screened according to the weight of each event obtained in step S103.
  • Step S106 Matching and filling the filtered event and event template to generate news content.
  • step S105 the events and event templates filtered in step S105 are matched and filled to generate news content.
  • the ways of matching and filling which are not limited in this application.
  • Step S107 Reprocessing the news content to obtain the final news content.
  • the news content acquired in step S106 is reprocessed, so that the final news content can be obtained.
  • reprocessing there may be multiple ways of reprocessing, such as manual review and polishing, or re-checking through the system, etc., which is not limited in this application.
  • Example 1 the natural language-based sports news writing method provided by this application can automatically process the corpus to be processed, thereby automatically generating news content, reducing manual labor, and improving the ability to automatically process news corpus .
  • obtaining the event set, the slot, and the slot value corresponding to each slot in step S101 includes the steps:
  • Step S201 Obtain a preset number of sports news corpus.
  • a preset number of sports news corpus there are many ways to obtain a preset number of sports news corpus. For example, you can manually input a preset number of sports news corpus into the device, or you can automatically obtain a preset number of sports news corpus through the device.
  • the server can automatically grab (for example, use a crawler to crawl) a preset number of sports news corpus, and this application does not limit this method.
  • the preset number can be 1 or 100000, but considering the amount of data processing and the accuracy of the result, the preset number can be selected as 1000.
  • Sports news corpus 1 Star A takes a goal and the goal is successful.
  • Step S202 Process the sports news corpus to obtain all events, slots, and slot values corresponding to each slot.
  • step S201 the sports news corpus in step S201 is processed to obtain all events, slots, and slot values corresponding to each slot.
  • the event is: a goal
  • the slot position is: __ shot (wherein, the position corresponding to the underline is the slot position);
  • Slot value star A shoots, star B shoots (wherein, the position corresponding to the underline is the slot value, that is, the slot value is star A and star B).
  • Step S203 Put all events into the same set to obtain an event set.
  • step S202 all the events obtained in step S202 are put into the same set to obtain an event set.
  • the event set there is at least one event in the event set, and the number of events included in the event set can be different according to the criteria for dividing the events.
  • the event can be selected as "goal”
  • the event set in step S203 contains only one event-"goal”
  • the event can also be selected as "successful goal” and "Failed to score a goal”
  • the event set in step S203 includes two events-"Successful goal” and "Failed goal”.
  • the event may also include a title, abstract, and text.
  • a piece of sports news generally includes at least three parts: “title”, “summary” and “text”; of course, it can also include such as “subtitle”, “editor's note”, “comment”, etc. This application does not limit this .
  • a natural language-based sports news writing method provided by this application can analyze and process a large amount of sports news corpus, thereby obtaining event collections, slots, and slots corresponding to each slot. Value, which provides a research template for the subsequent automatic generation of news content.
  • the method further includes the following steps:
  • Step S301 Determine whether each event, each slot, and the value of each slot meets the preset range.
  • Step S302 If it matches, reserve the event, slot and slot value.
  • Step S303 If it does not match, delete the event, slot or slot value.
  • each event, each slot, and the value of each slot in step S202 can be judged to determine whether it meets the preset range, and the removal or retention operation can be performed.
  • the preset range can be a specific numerical value or a definitive language. For example, you can select multiple events from 1,000 sports news items as “shoot”, “goal”, and “free throw”. The proportion of "free throw” is lower than that of "shoot” and "goal”. As a result, the "free throw” event can be eliminated when the accuracy requirements are not high, which greatly reduces the amount of data processing.
  • a natural language-based sports news writing method provided by this application can filter the acquired events, slots, and the slot value corresponding to each slot. On the one hand, it can improve the accuracy of news generation. On the other hand, it also reduces the amount of data processing.
  • the weight assignment of each event in step S103 includes the steps:
  • Step S401 According to all events, the corpus is divided to obtain several parts.
  • Step S402 For each event, construct a mapping between the event and each part.
  • Step S403 For each mapping, set the weight of the event corresponding to the mapping.
  • the corpus is divided into the first paragraph and the second paragraph.
  • the first paragraph is the title part of the corpus by default
  • the second paragraph is the summary part of the corpus by default. (In some news, the summary can even be larger than one paragraph. The place is simple and not considered).
  • the weights of the title event and the first and second paragraphs are 0.6 and 0.3, respectively, and the weights of the summary event and the first and second paragraphs are 0.4 and 0.7 respectively, that is
  • the system considers the importance of the title event and the summary event in the first and second paragraphs of the corpus, it is obvious that the title event will most likely be in the first paragraph, and the summary event will most likely be in the second paragraph.
  • the summary event will most likely be in the first segment, and the title event will most likely be in the second segment.
  • the corpus is divided into the first paragraph and the second paragraph.
  • the first paragraph is the title part of the corpus by default
  • the second paragraph is the summary part of the corpus by default. (In some news, the summary can even be larger than one paragraph. The place is simple and not considered).
  • the weights of the goal event and the first and second segments are 0.6 and 0.7, respectively, and the weights of the penalty event and the first and second segments are 0.4 and 0.3, respectively. That is, when the system considers the importance of goal events and free throw events in the first and second paragraphs of the corpus, it is obvious that the goal event will most likely be in the first paragraph, and the free throw event will most likely be in the first paragraph. Second paragraph. Of course, in some other embodiments, when sorting in descending order of weight, free throw events are likely to be in the first segment, and goal events are likely to be in the second segment.
  • a natural language-based sports news writing method provided by this application can assign weights to each event.
  • event weights can be customized according to needs, and on the other hand, it can also be used for subsequent events.
  • the coding work provides the basis.
  • the encoding of each event and the type and number of slots in the event template in step S104 includes the steps:
  • Step S501 Obtain an event template and an event to be encoded.
  • the event template and event to be coded can be manually input to the device, or the event template and event to be coded can be automatically obtained through the device.
  • the server can automatically crawl (for example, use a crawler to crawl) event templates and events to be coded, and this application does not limit this method.
  • the adopted code is binary code.
  • Step S502 According to regular matching, the event template and the type and number of slots in the event are counted.
  • regular expressions are used to sequentially traverse the event templates and events to be encoded, so that the types of event templates and the number of each type, as well as the types of slots in the event and the number of each type can be counted.
  • Step S503 Determine the total number m of all slots and the maximum number of occurrences n of slots in each event template.
  • step S502 the total number m of all slots and the maximum number of occurrences of slots in each event template n can be determined.
  • each event slot needs to be allocated 4 binary bits to indicate the number of occurrences of the event slot.
  • 1 on each binary bit indicates the event slot Occurs 1 time.
  • Step S504 According to the maximum number of occurrences of the slots, it is determined that each slot needs to be allocated n binary bits for representation; where n is a submultiple of 64.
  • long type data is used for encoding.
  • a long type data consists of 64 bits, so n needs to be a divisor of 64.
  • two long data are needed to encode the event template and the slot in the corpus, and the two long data are initialized to 0.
  • Step S506 Traverse each slot in the event template, and perform binary encoding on the number of slots in the current slot.
  • the long type is used to sequentially traverse each slot in the event template, and the number of each slot is binary coded. For example, if a slot appears twice in the event template, it is represented as 0011 in long type binary.
  • the y value can be calculated by the following formula:
  • i is the slot index address
  • Step S509 concatenate all the codes of the long type to obtain the final code.
  • Example 5 a natural language-based sports news writing method provided by this application uses binary to encode the type and number of slots in each event and event template, which can be used for subsequent events and event template screening work It provides convenience and reduces the workload of screening work.
  • step S105 includes the steps:
  • Step S601 Obtain the corresponding weight of each event.
  • Step S602 Compare each weight with a preset threshold one by one.
  • Step S603 Events whose retention weight is greater than the preset threshold value, and all other events are excluded.
  • steps S601-S603 a large number of events need to be screened, and the screening criterion is the comparison of the weight corresponding to the event with a preset threshold.
  • the news includes three parts: title, abstract and main text, and events include "goal” and "penalty”. Assign weights to the above two events according to business needs, where the weight of "goal” is greater than the weight of "free throw” (the weight here is between 0 and 1).
  • events are screened according to their weights. Events with a priority greater than a preset threshold are written in the headline part of the news, so “Goal” will be written in the headline part, and "Free throw” will be written in the abstract and main text. Either part of the two parts.
  • the event also includes two parts: summary and text, and the weight of the text event is higher than the weight of the summary event, then "penalty shots" will be preferentially selected to be written into the text part of the news.
  • a natural language-based sports news writing method provided by the present application can screen events, and can select appropriate events from numerous events as needed to provide a basis for subsequent news generation.
  • the screening of event templates in step S105 includes the steps:
  • Step S701 Obtain the filtered events and their corresponding codes and codes corresponding to all event templates in the event.
  • Step S702 Select one or more event templates with the largest number of slots as candidate event template templates.
  • Step S703 randomly select an event template from the candidate event templates as the event template to be filled.
  • steps S701-S703 the event template to be filled can be selected through the above steps.
  • event 1 is coded as 0010
  • event 2 is coded as 1010
  • event template 1 is coded as 0011
  • event template 2 is coded as 1011.
  • event template 2 If the slot of event template 1 is 1, and the slot of event template 2 is 2, therefore, event template 2 with the largest number of slots is selected as the event template to be filled, so event template 2 is the only one to be filled Event template.
  • slot of event template 1 is 1, and the slot of event template 2 is also 1, select the event template with the largest number of slots, that is, event template 1 and event template 2 with 1 slot.
  • event template to be filled an event template is randomly selected from event template 1 and event template 2 as the event template to be filled, that is, the event template to be filled can be event template 1 or event template 2.
  • a natural language-based sports news writing method provided by this application can filter event templates, and can select a suitable event template from numerous event templates as needed to provide a basis for subsequent news generation.
  • this application also provides a sports news writing device based on natural language to enhance the diversity of article sentence patterns and maximize the amount of article information.
  • the device includes:
  • the acquiring unit 801 is configured to acquire the corpus to be processed, the event set, the slot, and the slot value corresponding to each slot;
  • the event template tagging unit 802 is used to tag the event template in the corpus according to the value of each event, slot, and slot in the event set;
  • the weight assignment unit 803 is used for weight assignment for each event
  • the encoding unit 804 is used to encode each event and the type and number of slots in the event template;
  • the screening unit 805 is used to screen events and event templates according to the weight of each event;
  • the news content generating unit 806 is used to match and fill the selected events and event templates to generate news content
  • the news content processing unit 807 is used to reprocess the news content to obtain the final news content.
  • the device shown in Fig. 8 can correspondingly execute the content in the above method embodiment.
  • the parts not described in detail in this embodiment refer to the content recorded in the above method embodiment, which will not be repeated here.
  • FIG. 9 shows a schematic structural diagram of an electronic device 90 suitable for implementing embodiments of the present disclosure.
  • Electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablets), PMPs (portable multimedia players), vehicle-mounted terminals (e.g. Mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers, etc.
  • the electronic device shown in FIG. 3 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present disclosure.
  • the electronic device 90 may include a processing device (such as a central processing unit, a graphics processor, etc.) 901, which can be loaded into a random access device according to a program stored in a read-only memory (ROM) 902 or from a storage device 908.
  • the program in the memory (RAM) 903 executes various appropriate actions and processing.
  • various programs and data required for the operation of the electronic device 90 are also stored.
  • the processing device 901, the ROM 902, and the RAM 903 are connected to each other through a bus 904.
  • An input/output (I/O) interface 905 is also connected to the bus 904.
  • the following systems can be connected to the I/O interface 905: including input devices 906 such as touch screens, touch panels, keyboards, mice, image sensors, microphones, accelerometers, gyroscopes, etc.; including, for example, liquid crystal displays (LCD), speakers, An output device 907 such as a vibrator; a storage device 908 such as a magnetic tape, a hard disk, etc.; and a communication device 909.
  • the communication device 909 may allow the electronic device 90 to perform wireless or wired communication with other devices to exchange data.
  • the figure shows the electronic device 90 with various devices, it should be understood that it is not required to implement or have all the devices shown. It may be implemented alternatively or provided with more or fewer devices.
  • an embodiment of the present disclosure includes a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from the network through the communication device 909, or installed from the storage device 908, or installed from the ROM 902.
  • the processing device 901 the above-mentioned functions defined in the method of the embodiment of the present disclosure are executed.
  • the natural language-based sports news writing method, device and electronic equipment provided in this application can automatically generate news content based on event templates extracted after analyzing a large number of sports news in advance, and the user's own weight assignment to the event, which improves on the one hand
  • the diversity of sentence patterns in the article maximizes the amount of information in the article; on the other hand, it realizes efficient automatic writing of sports news articles, reducing labor costs.
  • the above-mentioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the two.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor device, system, or device, or a combination of any of the above.
  • Computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable removable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution apparatus, system, or device.
  • a computer-readable signal medium may include a data signal propagated in a baseband or as a part of a carrier wave, and a computer-readable program code is carried therein.
  • This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than the computer-readable storage medium.
  • the computer-readable signal medium may send, propagate, or transmit the program for use by or in combination with the instruction execution apparatus, system, or device .
  • the program code contained on the computer-readable medium can be transmitted by any suitable medium, including but not limited to: wire, optical cable, RF (Radio Frequency), etc., or any suitable combination of the above.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or it may exist alone without being assembled into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs.
  • the electronic device When the above-mentioned one or more programs are executed by the electronic device, the electronic device: obtains at least two Internet protocol addresses; A node evaluation request for an Internet Protocol address, wherein the node evaluation device selects an Internet Protocol address from the at least two Internet Protocol addresses and returns it; receives the Internet Protocol address returned by the node evaluation device; wherein, the obtained The Internet Protocol address indicates the edge node in the content distribution network.
  • the aforementioned computer-readable medium carries one or more programs, and when the aforementioned one or more programs are executed by the electronic device, the electronic device: receives a node evaluation request including at least two Internet Protocol addresses; Among the at least two Internet Protocol addresses, an Internet Protocol address is selected; the selected Internet Protocol address is returned; wherein the received Internet Protocol address indicates an edge node in the content distribution network.
  • the computer program code for performing the operations of the present disclosure may be written in one or more programming languages or a combination thereof.
  • the above-mentioned programming languages include object-oriented programming languages-such as Java, Smalltalk, C++, and also conventional The procedural programming language-such as "C" language or similar programming language.
  • the program code can be executed entirely on the user's computer, partly on the user's computer, executed as an independent software package, partly on the user's computer and partly executed on a remote computer, or entirely executed on the remote computer or server.
  • the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, through an Internet service provider). Internet connection).
  • LAN local area network
  • WAN wide area network
  • Internet service provider for example, AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
  • each block in the flowchart or block diagram can represent a module, program segment, or part of code, and the module, program segment, or part of code contains one or more for realizing the specified logic function.
  • Executable instructions can also occur in a different order from the order marked in the drawings. For example, two blocks shown one after another can actually be executed substantially in parallel, and they can sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or operations Or it can be realized by a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present disclosure can be implemented in software or hardware. Wherein, the name of the unit does not constitute a limitation on the unit itself under certain circumstances.
  • the first obtaining unit can also be described as "a unit for obtaining at least two Internet Protocol addresses.”

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

基于自然语言的体育新闻写作方法、装置及电子设备,所述方法包括步骤:获取待处理语料、事件集合、槽位及每一所述槽位对应的槽位取值;根据所述事件集合内每一事件、所述槽位及所述槽位取值,在所述语料中标注事件模板;对每一所述事件进行权重赋值;对每一所述事件及所述事件模板中所述槽位的种类和个数进行编码;根据每一所述事件的所述权重,对所述事件和所述事件模板进行筛选;对筛选后的所述事件和所述事件模板进行匹配和填充,生成新闻内容;对所述新闻内容进行再处理,以得到最终的所述新闻内容。该装置采用上述方法,提升了文章句式的多样性,使得文章信息量最大化;实现高效自动撰写体育新闻文章,减少了人力成本投入。

Description

基于自然语言的体育新闻写作方法、装置及电子设备 技术领域
本发明属于自然语言处理技术领域,具体涉及基于自然语言的体育新闻写作方法、装置及电子设备。
背景技术
现有的模板遍历匹配策略在对事件数据和模板进行匹配时,需要逐一比较模板中槽位和槽位个数,直到找到合适的模板。比如进球事件的一条数据为:{ORG_NEU:纽卡斯尔联,PER_ACT:舍尔,EVEINF_LOC_FROM:禁区外中央,EVEINF_BODY:右脚,EVEINF_LOC_TO:球门右上角},这条数据中包含五个槽位,当匹配模板时,假设进球事件有如下部分模板:
{ORG_NEU}收获进球!{PER_ACT}射门,球从{EVEINF_LOC_TO}飞进球门
{ORG_NEU}收获进球!{PER_ACT}{EVEINF_LOC_FROM}{EVEINF_BODY}射门,皮球划出一道漂亮的弧线从{EVEINF_LOC_TO}飞进球门。
遍历每个模板,根据正则匹配计算当前模板中包含的槽位集合,以及每个槽位出现的次数信息,如果当前模板的槽位集合是数据中槽位集合的子集,则成功匹配到一个模板。可以发现,遍历策略每次需要计算模板的槽位信息,再和数据的槽位信息进行集合运算比较耗时,因为线上系统有性能要求,所以往往匹配成功即返回,不会去匹配所有的模板。这样会带来另一个问题,通常成功匹配到的第一个符合条件的模板并不是最优的模板,即槽位种类和槽位个数并没有满足最大化要求。比如上面,匹配第一个模板时就能成功匹配,但是采用第一个模板会导致EVEINF_LOC_FROM和EVEINF_BODY信息不会被填充,进而生成的文章信息量较少。所以,遍历匹配策略不仅效率低,匹配结果的多样性也较差。
发明内容
本申请的目的之一在于针对现有技术的不足之处,提供一种基于自然语言的体育新闻写作方法,以提升文章句式的多样性,使得文章信息量最大化,所述方法包括步骤:
获取待处理语料、事件集合、槽位及每一所述槽位对应的槽位取值;
根据所述事件集合内每一事件、所述槽位及所述槽位取值,在所述语料中标注事件模板;
对每一所述事件进行权重赋值;
对每一所述事件及所述事件模板中所述槽位的种类和个数进行编码;
根据每一所述事件的所述权重,对所述事件和所述事件模板进行筛选;
对筛选后的所述事件和所述事件模板进行匹配和填充,生成新闻内容;
对所述新闻内容进行再处理,以得到最终的所述新闻内容。
优选地,所述获取事件集合、槽位及每一所述槽位对应的槽位取值包括步骤:
获取预设数量的体育新闻语料;
对所述体育新闻语料进行处理,以获取所有的所述事件、所述槽位和每一所述槽位对应的所述槽位取值;
将所有的所述事件放入同一集合内,以得到所述事件集合。
优选地,在所述对所述体育新闻语料进行处理,以获取所有的所述事件、所述槽位和每一所述槽位对应的所述槽位取值之后还包括步骤:
判断每一所述事件、每一所述槽位和每一所述槽位取值是否符合预设范围;
若符合,保留所述事件、所述槽位和所述槽位取值;
若不符合,删除所述事件、所述槽位或所述槽位取值。
优选地,所述事件包括标题、摘要和正文。
优选地,所述对每一所述事件进行权重赋值包括步骤:
根据所有的所述事件,对所述语料进行划分,以得到若干部分;
针对每一所述事件,构建所述事件与每一所述部分之间的映射;
针对每一所述映射,设置所述映射相对应的所述事件的权重。
优选地,所述对每一所述事件及所述事件模板中所述槽位的种类和个数进行编码包括步骤:
获取待编码的所述事件模板和所述事件;
根据正则匹配统计所述事件模板和所述事件中所述槽位的种类和个数;
确定所有的所述槽位总个数m和每个所述事件模板中所述槽位出现的最大次数n;
根据所述槽位出现的最大次数,确定每个所述槽位需分配n个二进制位以进行表示;其中,n为64的约数;
根据所述槽位总数和每个所述槽位分配的二进制位个数,确定采用的编码类型long类型和编码个数x;其中,x=[(m*n)/64]+1;
遍历所述事件模板中每个所述槽位,对当前所述槽位的槽位个数进行二进制编码;
根据当前所述槽位的索引地址i,确定当前所述槽位在第y个所述long类型上进行编码;其中,y=i/(64/n)+1;
将所述槽位个数的二进制表示向左移动p次;其中,p=(i-(y-1)*(64/n))*n;
将所有的所述long类型的编码拼接,以得到最终的所述编码。
优选地,所述对所述事件进行筛选包括步骤:
获取每一所述事件的对应权重;
将每一所述权重逐一与预设阈值比较;
保留所述权重大于所述预设阈值所对应的所述事件,剔除其他所有的所述事件。
优选地,所述对所述事件模板进行筛选包括步骤:
获取筛选后的所述事件及其对应的所述编码以及所述事件中所有所述事件模板对应的所述编码;
挑选所述槽位的个数最多的一个或多个所述事件模板作为候选事件模板模板;
从所述候选事件模板中随机选择一个所述事件模板作为待填充的所述事件模板。
本申请的目的之二在于针对现有技术的不足之处,提供一种基于自然语言的体育新闻写作装置,以提升文章句式的多样性,使得文章信息量最大化,所述装置包括:
获取单元,用于获取待处理语料、事件集合、槽位及每一所述槽位对应的槽位取值;
事件模板标注单元,用于根据所述事件集合内每一事件、所述槽位及所述槽位取值,在所述语料中标注事件模板;
权重赋值单元,用于对每一所述事件进行权重赋值;
编码单元,用于对每一所述事件及所述事件模板中所述槽位的种类和个数进行编码;
筛选单元,用于根据每一所述事件的所述权重,对所述事件和所述事件模板进行筛选;
新闻内容生成单元,用于对筛选后的所述事件和所述事件模板进行匹配和填充,生成新闻内容;
新闻内容处理单元,用于对所述新闻内容进行再处理,以得到最终的所述新闻内容。
本申请的目的之三在于针对现有技术的不足之处,提供一种电子设备,以提升文章句式的多样性,使得文章信息量最大化,所述电子设备包括:
至少一个处理器;以及,
与所述至少一个处理器通信连接的存储器;其中,
所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行前述任一所述体育新闻写作方法。
本申请提供的基于自然语言的体育新闻写作方法、装置及电子设备,可以根据预先对大量体育新闻分析后提取出的事件模板,以及用户自身对事件的权重赋值而自动生成新闻内容,一方面提升了文章句式的多样性,使得文章信息量最大化;另一方面实现高效自动撰写体育新闻文章,减少了人力成本投入。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本发明提供的一种基于自然语言的体育新闻写作方法实施例1的流程图;
图2是本发明提供的一种基于自然语言的体育新闻写作方法实施例2的流程图;
图3是本发明提供的一种基于自然语言的体育新闻写作方法实施例3的流程图;
图4是本发明提供的一种基于自然语言的体育新闻写作方法实施例4的流程图;
图5是本发明提供的一种基于自然语言的体育新闻写作方法实施例5的流程图;
图6是本发明提供的一种基于自然语言的体育新闻写作方法实施例6的流程图;
图7是本发明提供的一种基于自然语言的体育新闻写作方法实施例7的流程图;
图8是本发明提供的一种基于自然语言的体育新闻写作装置的结构示意图;
图9是本发明提供的一种电子设备的结构示意图。
具体实施方式
下面结合附图对本公开实施例进行详细描述。
以下通过特定的具体实例说明本公开的实施方式,本领域技术人员可由本说明书所揭露的内容轻易地了解本公开的其他优点与功效。显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。本公开还可以通过另外不同的具体实施方式加以实施或应用,本说明书中的各项细节也可以基于不同观点与应用,在没有背离本公开的精神下进行各种修饰或改变。需说明的是,在不冲突的情况下,以下实施例及实施例中的特征可以相互组合。基于本公开中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
需要说明的是,下文描述在所附权利要求书的范围内的实施例的各种方面。应显而易 见,本文中所描述的方面可体现于广泛多种形式中,且本文中所描述的任何特定结构及/或功能仅为说明性的。基于本公开,所属领域的技术人员应了解,本文中所描述的一个方面可与任何其它方面独立地实施,且可以各种方式组合这些方面中的两者或两者以上。举例来说,可使用本文中所阐述的任何数目个方面来实施设备及/或实践方法。另外,可使用除了本文中所阐述的方面中的一或多者之外的其它结构及/或功能性实施此设备及/或实践此方法。
还需要说明的是,以下实施例中所提供的图示仅以示意方式说明本公开的基本构想,图式中仅显示与本公开中有关的组件而非按照实际实施时的组件数目、形状及尺寸绘制,其实际实施时各组件的型态、数量及比例可为一种随意的改变,且其组件布局型态也可能更为复杂。
另外,在以下描述中,提供具体细节是为了便于透彻理解实例。然而,所属领域的技术人员将理解,可在没有这些特定细节的情况下实践所述方面。
本公开实施例提供一种基于自然语言的体育新闻写作方法。本实施例提供的基于自然语言的体育新闻写作方法可以由一计算系统来执行,该计算系统可以实现为软件,或者实现为软件和硬件的组合,该计算系统可以集成设置在服务器、终端设备等中。
实施例1
如图1,在本申请实施例中,本申请提供了一种基于自然语言的体育新闻写作方法,方法包括步骤:
步骤S101:获取待处理语料、事件集合、槽位及每一槽位对应的槽位取值。
在本步骤中,获取待处理语料、事件集合、槽位及每一槽位对应的槽位取值的方法有多种,比如,可以通过人工向设备输入预设置的待处理语料、事件集合、槽位及每一槽位对应的槽位取值,也可以通过设备自动获取预设置的待处理语料、事件集合、槽位及每一槽位对应的槽位取值,还可以服务器自动抓取(比如使用爬虫爬取)预设置的待处理语料、事件集合、槽位及每一槽位对应的槽位取值,本申请对此方式不进行限制。
步骤S102:根据事件集合内每一事件、槽位及槽位取值,在语料中标注事件模板。
在本步骤中,根据步骤S101中获取的事件集合内每一事件、槽位及槽位取值,可以对步骤S101中获取的待处理语料进行事件模板标注。
步骤S103:对每一事件进行权重赋值。
在本步骤中,对每一事件进行权重赋值的方法有多种,比如,可以通过人工向设备输入每一事件对应的权重,也可以通过设备自动获取每一事件对应的权重,还可以通过服务 器自动抓取(比如使用爬虫爬取)每一事件对应的权重,本申请对此方式不进行限制。
步骤S104:对每一事件及事件模板中槽位的种类和个数进行编码。
在本步骤中,对每一事件及事件模板中槽位的种类和个数进行编码的方法有多种,比如,可以通过人工向设备输入每一事件及事件模板中槽位的种类和个数的编码,也可以通过设备自动获取每一事件及事件模板中槽位的种类和个数的编码,还可以通过服务器自动抓取(比如使用爬虫爬取)每一事件及事件模板中槽位的种类和个数的编码,本申请对此方式不进行限制。
在本步骤中,所采用的编码类型可以为二进制、八进制或者十进制等,本申请对具体的编码类型不进行限制。
步骤S105:根据每一事件的权重,对事件和事件模板进行筛选。
在本步骤中,根据步骤S103中获取的每一事件的权重,对事件和事件模板进行筛选。其中,筛选的标准有多种,比如,可以将每一事件的权重与一个预设阈值比较,剔除权重小于该预设阈值的对应的事件等,本申请对筛选标准不进行限制。
步骤S106:对筛选后的事件和事件模板进行匹配和填充,生成新闻内容。
在本步骤中,对根据步骤S105中筛选后的事件和事件模板进行匹配和填充,从而生成新闻内容。其中,匹配和填充的方式有多种,本申请对此不进行限制。
步骤S107:对新闻内容进行再处理,以得到最终的新闻内容。
在本步骤中,对根据步骤S106中获取的新闻内容进行再处理,从而可以得到最终的新闻内容。其中,再处理的方式可以有多种,比如可以经过人工审阅润色,或者经由系统再次核对等,本申请对此不进行限制。
在实施例1中,本申请提供的一种基于自然语言的体育新闻写作方法可以自动对待处理的语料进行加工,从而可以自动生成新闻内容,降低了人工劳力,提高了对新闻语料的自动化处理能力。
实施例2
如图2,在本申请实施例中,在步骤S101中的获取事件集合、槽位及每一槽位对应的槽位取值包括步骤:
步骤S201:获取预设数量的体育新闻语料。
在本步骤中,获取预设数量的体育新闻语料的方法有多种,比如,可以通过人工向设备输入预设数量的体育新闻语料,也可以通过设备自动获取预设数量的体育新闻语料,还可以服务器自动抓取(比如使用爬虫爬取)预设数量的体育新闻语料,本申请对此方式不 进行限制。
在本步骤中,理论上,预设数量可以为1条,也可以为100000条,但是综合考虑数据处理量以及结果的准确性,可以将预设数量选择为1000条。
其中,选择2条体育新闻语料进行举例说明,具体如下:
体育新闻语料1:球星A射门,进球成功。
体育新闻语料2:球星B射门失败,进球失败。
步骤S202:对体育新闻语料进行处理,以获取所有的事件、槽位和每一槽位对应的槽位取值。
在本步骤中,对步骤S201中的体育新闻语料进行处理,以获取所有的事件、槽位和每一槽位对应的槽位取值。
具体地,以上述2条体育新闻语料进行举例说明,具体如下:
事件为:进球;
槽位为:__射门(其中,下划线相对应的位置即为槽位);
槽位取值: 球星A射门, 球星B射门(其中,下划线相对应的位置即为槽位取值,也即,槽位取值为球星A和球星B)。
步骤S203:将所有的事件放入同一集合内,以得到事件集合。
在本步骤中,将步骤S202中获得的所有事件放入同一集合内,以得到事件集合。其中,事件集合中至少有一个事件,且事件集合中包含的事件个数可以根据对事件的划分标准而有所不同。比如,在步骤S202中可以将事件选择为“进球”,则步骤S203中的事件集合只包含一个事件——“进球”;在步骤S202中也可以将事件选择为“进球成功”和“进球失败”,则步骤S203中的事件集合包含两个事件——“进球成功”和“进球失败”。
在本申请实施例中,事件还可以包括标题、摘要和正文。比如,一则体育新闻一般至少包括“标题”、“摘要”和“正文”三部分;当然,也还可以包括诸如“副标题”、“编者按”、“评论”等,本申请不对此进行限制。
在实施例2中,本申请提供的一种基于自然语言的体育新闻写作方法可以对大量的体育新闻语料进行分析和处理,从而从中获取事件集合、槽位及每一槽位对应的槽位取值,为后续自动生成新闻内容提供了研究模板。
实施例3
如图3,在本申请实施例中,在步骤S202中的在对体育新闻语料进行处理,以获取所有的事件、槽位和每一槽位对应的槽位取值之后还包括步骤:
步骤S301:判断每一事件、每一槽位和每一槽位取值是否符合预设范围。
步骤S302:若符合,保留事件、槽位和槽位取值。
步骤S303:若不符合,删除事件、槽位或槽位取值。
经过上述步骤S301-S303,可以对步骤S202中的每一事件、每一槽位和每一槽位取值进行判断,以判断是否符合预设范围,并进行剔除或者保留操作。
具体地,预设范围可以为具体数值,也可以为定义性语言。比如,可以从1000份体育新闻中选择多个事件为“射门”、“进球”、“罚球”,其中“罚球”相对于“射门”和“进球”所占比例较低,在对处理结果精准度要求不高的情况下可以剔除“罚球”这一事件,从而使得数据处理量大大降低。
在实施例3中,本申请提供的一种基于自然语言的体育新闻写作方法可以对获取的事件、槽位和每一槽位对应的槽位取值进行筛选,一方面可以提高新闻生成的准确性,另一方面也降低了数据处理数量。
实施例4
如图4,在本申请实施例中,在步骤S103中的对每一事件进行权重赋值包括步骤:
步骤S401:根据所有的事件,对语料进行划分,以得到若干部分。
步骤S402:针对每一事件,构建事件与每一部分之间的映射。
步骤S403:针对每一映射,设置映射相对应的事件的权重。
下面以表1和表2进行具体说明:
表1
事件部分 第一段 第二段
标题 0.6 0.3
摘要 0.4 0.7
针对表1,语料划分为第一段和第二段,其中,第一段默认为语料的标题部分,第二段默认为语料的摘要部分(在某些新闻中,摘要甚至可以大于一段,此处为简便不予考虑)。
当对标题事件和摘要事件进行权重赋值时,标题事件与第一段和第二段的权重分别为0.6和0.3,摘要事件与第一段和第二段的权重分别为0.4和0.7,也即,当系统考虑标题事件和摘要事件在语料中的第一段和第二段的重要性时,显然地,标题事件会大概率处于第一段,而摘要事件会大概率处于第二段。当然,在其他一些实施例中,当按照权重的降序排序时,摘要事件会大概率处于第一段,而标题事件会大概率处于第二段。
表2
事件部分 第一段 第二段
进球 0.6 0.7
罚球 0.4 0.3
针对表2,语料划分为第一段和第二段,其中,第一段默认为语料的标题部分,第二段默认为语料的摘要部分(在某些新闻中,摘要甚至可以大于一段,此处为简便不予考虑)。
当对进球事件和罚球事件进行权重赋值时,进球事件与第一段和第二段的权重分别为0.6和0.7,罚球事件与第一段和第二段的权重分别为0.4和0.3,也即,当系统考虑进球事件和罚球事件在语料中的第一段和第二段的重要性时,显然地,进球事件会大概率处于第一段,而罚球事件会大概率处于第二段。当然,在其他一些实施例中,当按照权重的降序排序时,罚球事件会大概率处于第一段,而进球事件会大概率处于第二段。
在实施例4中,本申请提供的一种基于自然语言的体育新闻写作方法可以对每一事件进行权重赋值,一方面可以根据需要进行事件权要的自定义设置,另一方面也为后续的编码工作提供了基础。
实施例5
如图5,在本申请实施例中,在步骤S104中的对每一事件及事件模板中槽位的种类和个数进行编码包括步骤:
步骤S501:获取待编码的事件模板和事件。
在本步骤中,获取待编码的事件模板和事件的方法有多种,比如,可以通过人工向设备输入待编码的事件模板和事件,也可以通过设备自动获取待编码的事件模板和事件,还可以服务器自动抓取(比如使用爬虫爬取)待编码的事件模板和事件,本申请对此方式不进行限制。
在本申请实施例中,采用的编码为二进制编码。
步骤S502:根据正则匹配统计事件模板和事件中槽位的种类和个数。
在本步骤中,采用正则表达式依次遍历待编码的事件模板和事件,从而可以统计出事件模板的种类和每个种类的个数,以及事件中槽位的种类和每个种类的个数。
步骤S503:确定所有的槽位总个数m和每个事件模板中槽位出现的最大次数n。
在本步骤中,通过步骤S502,可以确定所有的槽位总个数m和每个事件模板中槽位出现的最大次数n。
在本申请实施例中,由于采用的是二进制编码,故m=22,n=4。根据事件模板中槽位出现的最大4次确定每个事件槽位需要分配4位二进制位来表示该事件槽位出现次数,4位二进制位中,每个二进制位上的1表示该事件槽位出现1次。
步骤S504:根据槽位出现的最大次数,确定每个槽位需分配n个二进制位以进行表示;其中,n为64的约数。
步骤S505:根据槽位总数和每个槽位分配的二进制位个数,确定采用的编码类型long类型和编码个数x;其中,x=[(m*n)/64]+1。
在步骤S504-S505中,根据槽位总数22和每个槽位分配4位二进制位,得出共需要22*4=88位二进制位来表示所有的槽位的个数。
在本申请实施例中,综合考虑现实以及数据处理量,采用long类型数据进行编码。其中,一个long类型数据由64位组成,故n需要为64的约数。此时,需要2个long类型数据对事件模板和语料中槽位进行编码,并将2个long数据均初始化为0。
步骤S506:遍历事件模板中每个槽位,对当前槽位的槽位个数进行二进制编码。
在本步骤中,采用long类型依次遍历事件模板中每个槽位,对每个槽位个数进行二进制编码。比如某个槽位在该事件模板中出现了2次,则用long类型二进制表示为0011。
步骤S507:根据当前槽位的索引地址i,确定当前槽位在第y个long类型上进行编码;其中,y=i/(64/n)+1。
在本步骤中,根据当前槽位索引地址i,确定当前槽位需要在第y个long数据上编码,可通过如下公式计算y值:
y=i/(64/4)+1,
其中i为当前槽位索引地址。比如,当前槽位索引地址i为15,则y=1,表示当前槽位在第一个long数据上编码;如果当前槽位i为16,则y=2,需要在第二个long数据上编码。
步骤S508:将槽位个数的二进制表示向左移动p次;其中,p=(i-(y-1)*(64/n))*n。
在本步骤中,将槽位个数的二进制表示向左移动p次,其中p计算如下:
p=(i-(y-1)*(64/4))*4
其中i为槽位索引地址,y为当前槽位在第y个long数据上编码。例如,某槽位的索引地址i为10,则计算得到y=1,即该槽位在第一个long数据上编码,槽位在该事件模板中出现了2次,则槽位次数的二进制表示为0011,编码时,将0011向左移动10*4 位,得到如下编码:0000 0000 0000 0000 0000 0011 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000。
步骤S509:将所有的long类型的编码拼接,以得到最终的编码。
在本步骤中,将所有的long类型的编码拼接,以得到最终的编码,其中,同一个long数据上的所有槽位编码需相加。
在实施例5中,本申请提供的一种基于自然语言的体育新闻写作方法使用二进制对每一事件及事件模板中槽位的种类和个数进行编码,可以为后续的事件及事件模板筛选工作提供了便利,降低了筛选工作的工作量。
实施例6
如图6,在本申请实施例中,在步骤S105中的对事件进行筛选包括步骤:
步骤S601:获取每一事件的对应权重。
步骤S602:将每一权重逐一与预设阈值比较。
步骤S603:保留权重大于预设阈值所对应的事件,剔除其他所有的事件。
在步骤S601-S603,需要对众多的事件进行筛选,其中筛选标准即为事件对应的权重与预设阈值的比较。
下面以具体示例进行说明。
本申请实施例中,新闻包括标题、摘要和正文三部分,事件包括“进球”和“罚球”两种。根据业务需要对上述两种事件进行权重赋值,其中“进球”比“罚球”的权重大(这里权重取值在0到1之间)。在体育新闻写作中,根据事件权重对事件进行筛选,优先选择权重大于预设阈值的事件写入新闻标题部分,故“进球”会写入标题部分,而“罚球”会写入摘要和正文两个部分中任一部分。
进一步地,如果事件还包括摘要和正文两部分,且正文事件的权重要高于摘要事件的权重,那么“罚球”会优先选择写入新闻的正文部分。
在实施例6中,本申请提供的一种基于自然语言的体育新闻写作方法可以对事件进行筛选,可以根据需要从众多事件中选择合适的事件,为后续新闻的生成提供基础。
实施例7
如图7,在本申请实施例中,在步骤S105中的对事件模板进行筛选包括步骤:
步骤S701:获取筛选后的事件及其对应的编码以及事件中所有事件模板对应的编码。
步骤S702:挑选槽位的个数最多的一个或多个事件模板作为候选事件模板模板。
步骤S703:从候选事件模板中随机选择一个事件模板作为待填充的事件模板。
在步骤S701-S703中,通过上述步骤可以选择出待填充的事件模板。
下面以具体示例进行说明。
表3
项目编号 1 2
事件 0010 1010
事件模板 0011 1011
由表3可以看出,事件1编码为0010,事件2编码为1010;事件模板1编码为0011,事件模板2编码为1011。
若事件模板1的槽位为1个,事件模板2的槽位为2个,故选择槽位的个数最多的事件模板2作为待填充的事件模板,因此事件模板2作为待填充的唯一一个事件模板。
若事件模板1的槽位为1个,事件模板2的槽位也为1个,故选择槽位的个数最多的事件模板,也即均为1个槽位的事件模板1和事件模板2作为待填充的事件模板,然后从事件模板1和事件模板2中随机选择一个事件模板作为待填充的事件模板,也即,待填充的事件模板可以为事件模板1,也可以为事件模板2。
在实施例7中,本申请提供的一种基于自然语言的体育新闻写作方法可以对事件模板进行筛选,可以根据需要从众多事件模板中选择合适的事件模板,为后续新闻的生成提供基础。
如图8,在本申请实施例中,本申请还提供了一种基于自然语言的体育新闻写作装置,以提升文章句式的多样性,使得文章信息量最大化,装置包括:
获取单元801,用于获取待处理语料、事件集合、槽位及每一槽位对应的槽位取值;
事件模板标注单元802,用于根据事件集合内每一事件、槽位及槽位取值,在语料中标注事件模板;
权重赋值单元803,用于对每一事件进行权重赋值;
编码单元804,用于对每一事件及事件模板中槽位的种类和个数进行编码;
筛选单元805,用于根据每一事件的权重,对事件和事件模板进行筛选;
新闻内容生成单元806,用于对筛选后的事件和事件模板进行匹配和填充,生成新闻内容;
新闻内容处理单元807,用于对新闻内容进行再处理,以得到最终的新闻内容。
图8所示装置可以对应的执行上述方法实施例中的内容,本实施例未详细描述的部 分,参照上述方法实施例中记载的内容,在此不再赘述。
下面参考图9,其示出了适于用来实现本公开实施例的电子设备90的结构示意图。本公开实施例中的电子设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图3示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图9所示,电子设备90可以包括处理装置(例如中央处理器、图形处理器等)901,其可以根据存储在只读存储器(ROM)902中的程序或者从存储装置908加载到随机访问存储器(RAM)903中的程序而执行各种适当的动作和处理。在RAM 903中,还存储有电子设备90操作所需的各种程序和数据。处理装置901、ROM 902以及RAM 903通过总线904彼此相连。输入/输出(I/O)接口905也连接至总线904。
通常,以下系统可以连接至I/O接口905:包括例如触摸屏、触摸板、键盘、鼠标、图像传感器、麦克风、加速度计、陀螺仪等的输入装置906;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置907;包括例如磁带、硬盘等的存储装置908;以及通信装置909。通信装置909可以允许电子设备90与其他设备进行无线或有线通信以交换数据。虽然图中示出了具有各种装置的电子设备90,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置909从网络上被下载和安装,或者从存储装置908被安装,或者从ROM 902被安装。在该计算机程序被处理装置901执行时,执行本公开实施例的方法中限定的上述功能。
本申请提供的基于自然语言的体育新闻写作方法、装置及电子设备,可以根据预先对大量体育新闻分析后提取出的事件模板,以及用户自身对事件的权重赋值而自动生成新闻内容,一方面提升了文章句式的多样性,使得文章信息量最大化;另一方面实现高效自动撰写体育新闻文章,减少了人力成本投入。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限 于——电、磁、光、电磁、红外线、或半导体的装置、系统或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行装置、系统或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行装置、系统或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:获取至少两个网际协议地址;向节点评价设备发送包括所述至少两个网际协议地址的节点评价请求,其中,所述节点评价设备从所述至少两个网际协议地址中,选取网际协议地址并返回;接收所述节点评价设备返回的网际协议地址;其中,所获取的网际协议地址指示内容分发网络中的边缘节点。
或者,上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:接收包括至少两个网际协议地址的节点评价请求;从所述至少两个网际协议地址中,选取网际协议地址;返回选取出的网际协议地址;其中,接收到的网际协议地址指示内容分发网络中的边缘节点。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括面向对象的程序设计语言-诸如Java、步骤Smalltalk、C++,还包括常规的过程式程序设计语言-诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网 (LAN)或广域网(WAN)-连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的装置、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定,例如,第一获取单元还可以被描述为“获取至少两个网际协议地址的单元”。
应当理解,本公开的各部分可以用硬件、软件、固件或它们的组合来实现。
以上所述,仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应以权利要求的保护范围为准。

Claims (10)

  1. 一种基于自然语言的体育新闻写作方法,其特征在于,所述方法包括步骤:
    获取待处理语料、事件集合、槽位及每一所述槽位对应的槽位取值;
    根据所述事件集合内每一事件、所述槽位及所述槽位取值,在所述语料中标注事件模板;
    对每一所述事件进行权重赋值;
    对每一所述事件及所述事件模板中所述槽位的种类和个数进行编码;
    根据每一所述事件的所述权重,对所述事件和所述事件模板进行筛选;
    对筛选后的所述事件和所述事件模板进行匹配和填充,生成新闻内容;
    对所述新闻内容进行再处理,以得到最终的所述新闻内容。
  2. 根据权利要求1所述的体育新闻写作方法,其特征在于,所述获取事件集合、槽位及每一所述槽位对应的槽位取值包括步骤:
    获取预设数量的体育新闻语料;
    对所述体育新闻语料进行处理,以获取所有的所述事件、所述槽位和每一所述槽位对应的所述槽位取值;
    将所有的所述事件放入同一集合内,以得到所述事件集合。
  3. 根据权利要求2所述的体育新闻写作方法,其特征在于,在所述对所述体育新闻语料进行处理,以获取所有的所述事件、所述槽位和每一所述槽位对应的所述槽位取值之后还包括步骤:
    判断每一所述事件、每一所述槽位和每一所述槽位取值是否符合预设范围;
    若符合,保留所述事件、所述槽位和所述槽位取值;
    若不符合,删除所述事件、所述槽位或所述槽位取值。
  4. 根据权利要求1所述的体育新闻写作方法,其特征在于,所述事件包括标题、摘要和正文。
  5. 根据权利要求1所述的体育新闻写作方法,其特征在于,所述对每一所述事件进行权重赋值包括步骤:
    根据所有的所述事件,对所述语料进行划分,以得到若干部分;
    针对每一所述事件,构建所述事件与每一所述部分之间的映射;
    针对每一所述映射,设置所述映射相对应的所述事件的权重。
  6. 根据权利要求1所述的体育新闻写作方法,其特征在于,所述对每一所述事件 及所述事件模板中所述槽位的种类和个数进行编码包括步骤:
    获取待编码的所述事件模板和所述事件;
    根据正则匹配统计所述事件模板和所述事件中所述槽位的种类和个数;
    确定所有的所述槽位总个数m和每个所述事件模板中所述槽位出现的最大次数n;
    根据所述槽位出现的最大次数,确定每个所述槽位需分配n个二进制位以进行表示;其中,n为64的约数;
    根据所述槽位总数和每个所述槽位分配的二进制位个数,确定采用的编码类型long类型和编码个数x;其中,x=[(m*n)/64]+1;
    遍历所述事件模板中每个所述槽位,对当前所述槽位的槽位个数进行二进制编码;
    根据当前所述槽位的索引地址i,确定当前所述槽位在第y个所述long类型上进行编码;其中,y=i/(64/n)+1;
    将所述槽位个数的二进制表示向左移动p次;其中,p=(i-(y-1)*(64/n))*n;
    将所有的所述long类型的编码拼接,以得到最终的所述编码。
  7. 根据权利要求1所述的体育新闻写作方法,其特征在于,所述对所述事件进行筛选包括步骤:
    获取每一所述事件的对应权重;
    将每一所述权重逐一与预设阈值比较;
    保留所述权重大于所述预设阈值所对应的所述事件,剔除其他所有的所述事件。
  8. 根据权利要求1所述的体育新闻写作方法,其特征在于,所述对所述事件模板进行筛选包括步骤:
    获取筛选后的所述事件及其对应的所述编码以及所述事件中所有所述事件模板对应的所述编码;
    挑选所述槽位的个数最多的一个或多个所述事件模板作为候选事件模板模板;
    从所述候选事件模板中随机选择一个所述事件模板作为待填充的所述事件模板。
  9. 一种基于自然语言的体育新闻写作装置,其特征在于,所述装置包括:
    获取单元,用于获取待处理语料、事件集合、槽位及每一所述槽位对应的槽位取值;
    事件模板标注单元,用于根据所述事件集合内每一事件、所述槽位及所述槽位取值,在所述语料中标注事件模板;
    权重赋值单元,用于对每一所述事件进行权重赋值;
    编码单元,用于对每一所述事件及所述事件模板中所述槽位的种类和个数进行编码;
    筛选单元,用于根据每一所述事件的所述权重,对所述事件和所述事件模板进行筛选;
    新闻内容生成单元,用于对筛选后的所述事件和所述事件模板进行匹配和填充,生成新闻内容;
    新闻内容处理单元,用于对所述新闻内容进行再处理,以得到最终的所述新闻内容。
  10. 一种电子设备,其特征在于,所述电子设备包括:
    至少一个处理器;以及,
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行前述任一权利要求1-8所述体育新闻写作方法。
PCT/CN2020/097005 2019-12-23 2020-06-19 基于自然语言的体育新闻写作方法、装置及电子设备 WO2021128768A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CA3165616A CA3165616A1 (en) 2019-12-23 2020-06-19 Sports news writing method based on natural language, device and electronic equipment

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911336569.4A CN111191434B (zh) 2019-12-23 2019-12-23 基于自然语言的体育新闻写作方法、装置及电子设备
CN201911336569.4 2019-12-23

Publications (1)

Publication Number Publication Date
WO2021128768A1 true WO2021128768A1 (zh) 2021-07-01

Family

ID=70711044

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/097005 WO2021128768A1 (zh) 2019-12-23 2020-06-19 基于自然语言的体育新闻写作方法、装置及电子设备

Country Status (3)

Country Link
CN (1) CN111191434B (zh)
CA (1) CA3165616A1 (zh)
WO (1) WO2021128768A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117390144A (zh) * 2023-12-13 2024-01-12 北京搜狐新媒体信息技术有限公司 一种新闻时效性的确定方法及装置

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111191434B (zh) * 2019-12-23 2024-04-26 苏宁云计算有限公司 基于自然语言的体育新闻写作方法、装置及电子设备
CN113553812A (zh) * 2021-06-22 2021-10-26 北京来也网络科技有限公司 结合rpa和ai的新闻处理方法及装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105975466A (zh) * 2015-11-04 2016-09-28 新华通讯社 一种面向短新闻的机器写稿方法及装置
CN106407168A (zh) * 2016-09-06 2017-02-15 首都师范大学 一种应用文自动生成方法
CN106776523A (zh) * 2017-01-22 2017-05-31 百度在线网络技术(北京)有限公司 基于人工智能的新闻速报生成方法及装置
CN109902305A (zh) * 2019-03-04 2019-06-18 上海宝尊电子商务有限公司 基于命名实体识别的模板生成、搜索及文本生成设备与方法
CN110209838A (zh) * 2019-06-10 2019-09-06 广东工业大学 一种文本模板获取方法及相关装置
CN111191434A (zh) * 2019-12-23 2020-05-22 苏宁云计算有限公司 基于自然语言的体育新闻写作方法、装置及电子设备

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10540430B2 (en) * 2011-12-28 2020-01-21 Cbs Interactive Inc. Techniques for providing a natural language narrative

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105975466A (zh) * 2015-11-04 2016-09-28 新华通讯社 一种面向短新闻的机器写稿方法及装置
CN106407168A (zh) * 2016-09-06 2017-02-15 首都师范大学 一种应用文自动生成方法
CN106776523A (zh) * 2017-01-22 2017-05-31 百度在线网络技术(北京)有限公司 基于人工智能的新闻速报生成方法及装置
CN109902305A (zh) * 2019-03-04 2019-06-18 上海宝尊电子商务有限公司 基于命名实体识别的模板生成、搜索及文本生成设备与方法
CN110209838A (zh) * 2019-06-10 2019-09-06 广东工业大学 一种文本模板获取方法及相关装置
CN111191434A (zh) * 2019-12-23 2020-05-22 苏宁云计算有限公司 基于自然语言的体育新闻写作方法、装置及电子设备

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117390144A (zh) * 2023-12-13 2024-01-12 北京搜狐新媒体信息技术有限公司 一种新闻时效性的确定方法及装置
CN117390144B (zh) * 2023-12-13 2024-03-08 北京搜狐新媒体信息技术有限公司 一种新闻时效性的确定方法及装置

Also Published As

Publication number Publication date
CN111191434B (zh) 2024-04-26
CN111191434A (zh) 2020-05-22
CA3165616A1 (en) 2021-07-01

Similar Documents

Publication Publication Date Title
WO2021128768A1 (zh) 基于自然语言的体育新闻写作方法、装置及电子设备
CN104008064B (zh) 用于多级存储器压缩的方法和系统
CN106201481A (zh) 应用程序开发系统中的组件管理方法和装置
CN107924679A (zh) 输入理解处理期间在响应选择中的延迟绑定
CN103106262A (zh) 文档分类、支持向量机模型生成的方法和装置
CN110134768B (zh) 文本的处理方法、装置、设备及存储介质
CN109947431A (zh) 一种代码生成方法、装置、设备及存储介质
CN105765537A (zh) 持久混洗系统
CN105022716A (zh) 一种多数据链路的gpu服务器
CN113962401A (zh) 联邦学习系统、联邦学习系统中的特征选择方法及装置
CN109544392B (zh) 用于保险订单处理的方法、系统、设备以及介质
JP5430960B2 (ja) コンテンツ分類装置、方法及びプログラム
CN112860850B (zh) 人机交互方法、装置、设备及存储介质
CN113158054A (zh) 一种基于预分类标签的数据推送方法、装置和电子设备
CN112256911A (zh) 一种音频匹配方法、装置和设备
CN108154306A (zh) 众包模式的任务处理方法、装置、设备及计算机可读存储介质
CN115150413B (zh) 区块链数据的存储方法、装置、电子设备及存储介质
CN114331745B (zh) 数据处理方法、系统、可读存储介质和电子设备
CN112818684B (zh) 地址元素排序方法、装置、电子设备及存储介质
CN110532304B (zh) 数据处理方法及装置、计算机可读存储介质以及电子设备
CN111144495B (zh) 一种业务分发方法、装置及介质
CN109558387A (zh) 身份证号的处理方法、装置、存储介质及终端
CN114356386A (zh) 一种分块差分升级方法、终端设备和计算机可读存储介质
US20200320054A1 (en) Computer program for providing database management
CN111399843A (zh) 将sql运行信息映射到sql文件的方法、系统及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20904410

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 3165616

Country of ref document: CA

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20904410

Country of ref document: EP

Kind code of ref document: A1