CN111178056A - Deep learning based file generation method and device and electronic equipment - Google Patents
Deep learning based file generation method and device and electronic equipment Download PDFInfo
- Publication number
- CN111178056A CN111178056A CN202010001994.4A CN202010001994A CN111178056A CN 111178056 A CN111178056 A CN 111178056A CN 202010001994 A CN202010001994 A CN 202010001994A CN 111178056 A CN111178056 A CN 111178056A
- Authority
- CN
- China
- Prior art keywords
- word segmentation
- word
- file
- input
- recommended
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
- 
        - G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
 
- 
        - G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
 
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a deep learning-based file generation method, a deep learning-based file generation device and electronic equipment, wherein the method comprises the following steps: acquiring a user input title; performing word segmentation operation on the title by adopting a jieba Chinese word segmentation method; extracting keywords from the titles subjected to word segmentation operation by adopting a TF-IDF algorithm to obtain a keyword set; and conveying the keyword set as input to a preset text generation algorithm model, and taking the obtained output as a file. By adopting the method, the device extracts key words which are mainly described in the file as the input of the model by processing the file data, constructs new training data, and ensures that the input and the output have strong corresponding relation, thereby improving the correlation between the file generated by the model and the input and greatly improving the quality of the generated file.
    Description
Technical Field
      The invention belongs to the technical field of deep neural network natural languages, and particularly relates to a file generation method and device based on deep learning and electronic equipment.
    Background
      At present, new products of the e-commerce industry need to use a file during market promotion so as to provide a recommendation reason for consumers, and a better file needs to be designed to highlight product selling points. In the field, the traditional method trains the model by using the title as input and the recommended reason scheme as output, the generated commodity recommended reason scheme and the manually written scheme have large quality difference, the large-scale application of automatic scheme generation is hindered, the specific defects are that the scheme generated by the traditional method cannot accurately highlight the selling point of the commodity, and the corresponding relation between the generated scheme and the input title is not strong.
    Disclosure of Invention
      One of the objectives of the present application is to provide a method for generating a document based on deep learning to improve the quality of the generated document, aiming at the disadvantages of the prior art, the method includes the steps of:
      acquiring a user input title;
      performing word segmentation operation on the title by adopting a jieba Chinese word segmentation method;
      extracting keywords from the titles subjected to word segmentation operation by adopting a TF-IDF algorithm to obtain a keyword set;
      and conveying the keyword set as input to a preset text generation algorithm model, and taking the obtained output as a file.
      Preferably, the method for generating the preset text generation algorithm model includes the steps of:
      acquiring a plurality of manually written recommended documents;
      performing word segmentation operation on the recommended case by adopting a jieba Chinese word segmentation method;
      extracting keywords of the recommended case subjected to word segmentation operation by adopting a TF-IDF algorithm to obtain a keyword set;
      and taking the keyword set as input, taking the recommended file as output, and training to obtain the text generation algorithm model.
      Preferably, the jieba chinese word segmentation method includes the steps of:
      acquiring an input statement;
      establishing a word segmentation DAG word graph based on the Trie tree word segmentation model;
      calculating global probability Route to obtain word frequency maximum segmentation combination based on prefix dictionary;
      judging whether the word frequency maximum segmentation combination is a login word or not; if the word is judged to be the login word, the login word is labeled according to the dictionary identification and output; if the Chinese characters are not the login words, separately processing the Chinese characters and the non-Chinese characters by using Token identification;
      if the character is judged to be Chinese, loading a hidden horse HMM probability model graph, obtaining word segmentation and label by using a Viterbi algorithm dynamic rule, and then outputting;
      if the judgment result is non-Chinese, identifying the combination of English, number and time forms, giving corresponding labels, and outputting.
      Preferably, before the building of the participle DAG word graph based on the Trie tree participle model, the method further comprises the following steps:
      loading the log-in word dictionary;
      and establishing the Trie tree word segmentation model.
      Preferably, the step of establishing a participle DAG word graph based on the Trie tree participle model and the step of obtaining the input sentence further comprises the following steps:
      cleaning the sentence and judging whether the sentence contains special characters or not;
      and if the special characters are judged to be contained, separating the special characters, identifying the special characters as unknown parts of speech and then outputting the unknown parts of speech.
      A second objective of the present application is to provide a document generation device based on deep learning to improve the quality of generated documents, aiming at the disadvantages of the prior art, the device includes:
      an acquisition unit configured to acquire a user input title;
      the word segmentation unit is used for performing word segmentation operation on the title by adopting a jieba Chinese word segmentation method;
      a keyword set obtaining unit, configured to perform keyword extraction on the title subjected to the word segmentation operation by using a TF-IDF algorithm to obtain a keyword set;
      the file acquisition unit is used for conveying the keyword set as input to a preset text generation algorithm model, and taking the obtained output as a file;
      and the storage unit is used for storing the jieba Chinese word segmentation method, the TF-IDF algorithm and the preset text generation algorithm model.
      Preferably, the apparatus further comprises:
      and the preset text generation algorithm model acquisition unit is used for acquiring the preset text generation algorithm model.
      Preferably, the preset text generation algorithm model obtaining unit includes:
      the system comprises a recommended document acquisition unit, a recommendation document generation unit and a recommendation document generation unit, wherein the recommended document acquisition unit is used for acquiring a plurality of manually written recommended documents;
      the word segmentation unit of the recommended case is used for carrying out word segmentation operation on the recommended case by adopting a jieba Chinese word segmentation method;
      a recommended case keyword set acquisition unit, configured to perform keyword extraction on the recommended case subjected to word segmentation operation by using a TF-IDF algorithm to obtain a keyword set;
      and the training unit is used for taking the keyword set as input and the recommended file as output to train so as to obtain the text generation algorithm model.
      It is a third object of the present application to address the deficiencies of the prior art, to provide an electronic device for improving the quality of generating a document, the electronic device comprising:
      at least one processor; and the number of the first and second groups,
      a memory communicatively coupled to the at least one processor; wherein,
      the memory stores instructions executable by the at least one processor to enable the at least one processor to perform any of the above-described document generation methods.
      It is a fourth object of the present application to address the deficiencies of the prior art, and to provide a non-transitory computer readable storage medium for improving the quality of generating a document, the non-transitory computer readable storage medium storing computer instructions for causing the computer to perform any of the above-described document generation methods.
      According to the method and the device, the file data are processed, key words which are described in the file are extracted and used as the input of the model, new training data are constructed, the input and the output have strong corresponding relation, the relevance between the file generated by the model and the input is improved, and the quality of the generated file is greatly improved.
    Drawings
      In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
      FIG. 1 is a flowchart of a method for generating a deep learning-based pattern according to the present invention;
      FIG. 2 is a flowchart of a method for generating a deep learning-based pattern according to the present invention;
      FIG. 3 is a flow chart of a method of the jieba Chinese word segmentation method employed in the present invention;
      FIG. 4 is a schematic structural diagram of an intelligent document generation apparatus based on deep learning technology according to the present invention;
      FIG. 5 is a schematic structural diagram of a preset text generation algorithm model obtaining unit provided by the present invention;
      fig. 6 is a schematic structural diagram of an electronic device provided in the present invention.
    Detailed Description
      The embodiments of the present disclosure are described in detail below with reference to the accompanying drawings.
      The embodiments of the present disclosure are described below with specific examples, and other advantages and effects of the present disclosure will be readily apparent to those skilled in the art from the disclosure in the specification. It is to be understood that the described embodiments are merely illustrative of some, and not restrictive, of the embodiments of the disclosure. The disclosure may be embodied or carried out in various other specific embodiments, and various modifications and changes may be made in the details within the description without departing from the spirit of the disclosure. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
      It is noted that various aspects of the embodiments are described below within the scope of the appended claims. It should be apparent that the aspects described herein may be embodied in a wide variety of forms and that any specific structure and/or function described herein is merely illustrative. Based on the disclosure, one skilled in the art should appreciate that one aspect described herein may be implemented independently of any other aspects and that two or more of these aspects may be combined in various ways. For example, an apparatus may be implemented and/or a method practiced using any number of the aspects set forth herein. Additionally, such an apparatus may be implemented and/or such a method may be practiced using other structure and/or functionality in addition to one or more of the aspects set forth herein.
      It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present disclosure, and the drawings only show the components related to the present disclosure rather than the number, shape and size of the components in actual implementation, and the type, amount and ratio of the components in actual implementation may be changed arbitrarily, and the layout of the components may be more complicated.
      In addition, in the following description, specific details are provided to facilitate a thorough understanding of the examples. However, it will be understood by those skilled in the art that the aspects may be practiced without these specific details.
      The embodiment of the disclosure provides a file generation method based on deep learning. The deep learning-based pattern generation method provided by the embodiment can be executed by a computing device, which can be implemented as software or as a combination of software and hardware, and can be integrally arranged in a server, a terminal device and the like.
      Referring to fig. 1, in an embodiment of the present application, the present application provides a deep learning-based pattern generation method, including:
      s101: and acquiring a user input title.
      In this step, the user inputs the title in the document generation system, and the system can input the title by the acquirer to wait for the subsequent steps.
      S102: and performing word segmentation operation on the title by adopting a jieba Chinese word segmentation method.
      In the step, the system performs word segmentation operation on the title input by the user by adopting a jieba Chinese word segmentation method, so that a plurality of word segments can be obtained.
      S103: and extracting keywords of the title subjected to word segmentation operation by adopting a TF-IDF algorithm to obtain a keyword set.
      In the step, the system adopts TF-IDF algorithm to extract keywords from the segmented title, so as to obtain a keyword set.
      S104: and conveying the keyword set as input to a preset text generation algorithm model, and taking the obtained output as a file.
      In this step, the system combines the keyword set obtained in step S103 as input and transmits the input to the preset text generation algorithm model, and the preset text generation algorithm model can automatically obtain an output, which is the required file.
      As shown in fig. 2, in the embodiment of the present application, the method for generating the preset text generation algorithm model in step S104 includes the steps of:
      s201: and acquiring a plurality of manually written recommended documents.
      In this step, first, a plurality of manually written recommended documents, such as recommended documents of products, are obtained. Theoretically, the preset text generation algorithm model obtained when the number of recommended documents is larger will be more accurate, but at the same time, the data processing capacity required by the system will be improved correspondingly. Therefore, in the embodiment of the present application, 1000 recommended documents can be selected for processing.
      S202: and performing word segmentation operation on the recommended case by adopting a jieba Chinese word segmentation method.
      In this step, the system performs the word segmentation operation on all the recommended cases by adopting a jieba Chinese word segmentation method so as to perform the subsequent processing.
      S203: and extracting keywords of the recommended case subjected to word segmentation operation by adopting a TF-IDF algorithm to obtain a keyword set.
      In the step, the system adopts TF-IDF algorithm to extract the key words of all the recommended documents after word segmentation operation, so as to obtain the key word set.
      S204: and taking the keyword set as input, taking the recommended file as output, and training to obtain the text generation algorithm model.
      In this step, the system takes the keyword set as input, takes all the recommended documents as output, and then obtains the text generation algorithm model after training.
      As shown in fig. 3, in the embodiment of the present application, the jieba chinese word segmentation method is the prior art, and specifically includes the following steps:
      loading a log-in word dictionary;
      establishing a Trie tree word segmentation model;
      acquiring an input statement;
      establishing a word segmentation DAG word graph based on the Trie tree word segmentation model;
      calculating global probability Route to obtain word frequency maximum segmentation combination based on prefix dictionary;
      judging whether the word frequency maximum segmentation combination is a login word or not; if the word is judged to be the login word, the login word is labeled according to the dictionary identification and output; if the Chinese characters are not the login words, separately processing the Chinese characters and the non-Chinese characters by using Token identification;
      if the character is judged to be Chinese, loading a hidden horse HMM probability model graph, obtaining word segmentation and label by using a Viterbi algorithm dynamic rule, and then outputting;
      if the judgment result is non-Chinese, identifying the combination of English, number and time forms, giving corresponding labels, and outputting.
      In this embodiment of the present application, the step between obtaining the input sentence and establishing the participle DAG word graph based on the Trie tree participle model further includes:
      cleaning the sentence and judging whether the sentence contains special characters or not;
      and if the special characters are judged to be contained, separating the special characters, identifying the special characters as unknown parts of speech and then outputting the unknown parts of speech.
      The following describes in detail specific steps of a deep learning-based document generation method provided by the present application with specific embodiments.
      (1) A plurality of manually written recommendation documents are obtained in advance, and two samples are as follows:
      sample 1: the white color is a new color in the year, the color design is very in place, and the artistic sense is very sufficient. The plastic machine body is made of materials, and has good hand feeling, thinness, beauty and elegance.
      Sample 2: the high-definition large screen is excellent, the picture is clear and fine, and comfortable visual experience is brought. The game player can play the game with a non-trivial running memory, and the game player can be used for more applications without being afraid of cards. By using the high-definition lens, the optical anti-shake photographing is simpler, the photographing is easy no matter in a close shot or a long shot, and the image quality is clearer. The battery capacity is large, and the battery is standby for a long time, so that stronger cruising experience is brought to the user.
      (2) Performing word segmentation operation on the case by using a jieba Chinese word segmentation method, wherein the two samples are specifically as follows after the word segmentation operation:
      after word segmentation of sample 1:the white color is a new color in the year, the color design is very in place, and the artistic sense is very sufficient. The plastic machine body is made of materials, and has good hand feeling, thinness, beauty and elegance. 
      Sample 2 after word segmentation:the high-definition large screen is excellent, the picture is clear and fine, and comfortable visual experience is brought. Can not be matched What to want is a small amount of running memoryHow to play and more applications are not afraid of cards. By using the high-definition lens, the optical anti-shake photographing device The method is simple, and the picture quality is clearer no matter the short shot or the long shot is easy to shoot. The battery has large capacity and ultra-long standby, and brings more strength to people The cruising experience of. 
      Wherein, the words on the same underline are the same participle combination.
      (3) Extracting a certain proportion of keywords from the segmented case by adopting a TF-IDF algorithm to obtain a keyword set, wherein the two samples are processed as follows:
      sample 1 keyword: color design material plastic
      Sample 2 keyword: lens battery capacity
      (4) Taking the keyword set as input and the pre-collected file as output to construct a training data set, which is specifically as follows:
      sample 1 input: color design material plastic
      Sample 1 output: the white color is a new color in the year, the color design is very in place, and the artistic sense is very sufficient. The plastic machine body is made of materials, and has good hand feeling, thinness, beauty and elegance.
      Sample 2 input: lens battery capacity
      Sample 2 output: the high-definition large screen is excellent, the picture is clear and fine, and comfortable visual experience is brought. The game player can play the game with a non-trivial running memory, and the game player can be used for more applications without being afraid of cards. By using the high-definition lens, the optical anti-shake photographing is simpler, the photographing is easy no matter in a close shot or a long shot, and the image quality is clearer. The battery capacity is large, and the battery is standby for a long time, so that stronger cruising experience is brought to the user.
      In the embodiment of the application, a Transformer-based text generation model is trained by using the training data set.
      (5) After a new user input title is obtained, extracting keywords from the title input by the user by using the keyword set in the step (3):
      and (3) user input: XXX color black business flagship full-network mobile phone calf material 8G memory super large battery capacity
      Extracting keywords: color material battery capacity
      (6) And (4) inputting the keyword set obtained in the step (3) as an input into the text generation model obtained in the step (4), and generating a recommended case by the text generation model, wherein the specific steps are as follows:
      the orange color is a new color in the year, and a luxurious smell is revealed by matching with the material of the calfskin. High capacity batteries, and low power processing techniques to achieve ultra-long standby times.
      As shown in fig. 4, in the embodiment of the present application, the present invention further provides a deep learning-based document generation apparatus, including:
      an obtaining unit 401, configured to obtain a user input title;
      a word segmentation unit 402, configured to perform word segmentation on the title by using a jieba chinese word segmentation method;
      a keyword set obtaining unit 403, configured to perform keyword extraction on the title subjected to the word segmentation operation by using a TF-IDF algorithm to obtain a keyword set;
      a document acquisition unit 404, configured to convey the keyword set as input to a preset text generation algorithm model, and take the obtained output as a document;
      the storage unit 405 is used for storing the jieba Chinese word segmentation method, the TF-IDF algorithm and the preset text generation algorithm model;
      a preset text generation algorithm model obtaining unit  406, configured to obtain the preset text generation algorithm model.
      The apparatus shown in fig. 4 can correspondingly execute the content in the above method embodiment, and details of the part not described in detail in this embodiment refer to the content described in the above method embodiment, which is not described again here.
      As shown in fig. 5, in the embodiment of the present application, the preset text generation algorithm model obtaining unit  406 includes:
      a recommended document acquiring unit 501, configured to acquire a plurality of manually written recommended documents;
      a recommended case word segmentation unit 502, configured to perform word segmentation on the recommended case by using a jieba chinese word segmentation method;
      a recommended document keyword set obtaining unit 503, configured to perform keyword extraction on the recommended document subjected to the word segmentation operation by using a TF-IDF algorithm to obtain a keyword set;
      a training unit 504, configured to take the keyword set as an input and the recommended pattern as an output, and train to obtain the text generation algorithm model.
      The apparatus shown in fig. 5 may correspondingly execute the content in the above method embodiment, and details of the part not described in detail in this embodiment refer to the content described in the above method embodiment, which is not described again here.
      Referring to fig. 6, an embodiment of the present disclosure also provides an electronic device 60, including:
      at least one processor; and the number of the first and second groups,
      a memory communicatively coupled to the at least one processor; wherein,
      the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the deep learning based pattern generation method of the method embodiments described above.
      The disclosed embodiments also provide a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the foregoing method embodiments.
      The disclosed embodiments also provide a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform the deep learning based pattern generation method of the aforementioned method embodiments.
      Referring now to FIG. 6, a schematic diagram of an electronic device 60 suitable for use in implementing embodiments of the present disclosure is shown. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., car navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
      As shown in fig. 6, the electronic device 60 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 60 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
      Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, image sensor, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 60 to communicate with other devices wirelessly or by wire to exchange data. While the figures illustrate an electronic device 60 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
      In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of the embodiments of the present disclosure.
      According to the method and the device, the file data are processed, key words which are described in the file are extracted and used as the input of the model, new training data are constructed, the input and the output have strong corresponding relation, the relevance between the file generated by the model and the input is improved, and the quality of the generated file is greatly improved.
      It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
      The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
      The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring at least two internet protocol addresses; sending a node evaluation request comprising the at least two internet protocol addresses to node evaluation equipment, wherein the node evaluation equipment selects the internet protocol addresses from the at least two internet protocol addresses and returns the internet protocol addresses; receiving an internet protocol address returned by the node evaluation equipment; wherein the obtained internet protocol address indicates an edge node in the content distribution network.
      Alternatively, the computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: receiving a node evaluation request comprising at least two internet protocol addresses; selecting an internet protocol address from the at least two internet protocol addresses; returning the selected internet protocol address; wherein the received internet protocol address indicates an edge node in the content distribution network.
      Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
      The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
      The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a unit does not in some cases constitute a limitation of the unit itself, for example, the first retrieving unit may also be described as a "unit for retrieving at least two internet protocol addresses".
      It should be understood that portions of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof.
      The above description is only for the specific embodiments of the present disclosure, but the scope of the present disclosure is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present disclosure should be covered within the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.
    Claims (10)
1. A method for generating a file based on deep learning is characterized by comprising the following steps:
      acquiring a user input title;
      performing word segmentation operation on the title by adopting a jieba Chinese word segmentation method;
      extracting keywords from the titles subjected to word segmentation operation by adopting a TF-IDF algorithm to obtain a keyword set;
      and conveying the keyword set as input to a preset text generation algorithm model, and taking the obtained output as a file.
    2. The method of claim 1, wherein the method of generating the predetermined text-generating algorithm model comprises the steps of:
      acquiring a plurality of manually written recommended documents;
      performing word segmentation operation on the recommended case by adopting a jieba Chinese word segmentation method;
      extracting keywords of the recommended case subjected to word segmentation operation by adopting a TF-IDF algorithm to obtain a keyword set;
      and taking the keyword set as input, taking the recommended file as output, and training to obtain the text generation algorithm model.
    3. The method for generating a literary sketch of claim 1 or 2, wherein the jieba chinese participle method comprises the steps of:
      acquiring an input statement;
      establishing a word segmentation DAG word graph based on the Trie tree word segmentation model;
      calculating global probability Route to obtain word frequency maximum segmentation combination based on prefix dictionary;
      judging whether the word frequency maximum segmentation combination is a login word or not; if the word is judged to be the login word, the login word is labeled according to the dictionary identification and output; if the Chinese characters are not the login words, separately processing the Chinese characters and the non-Chinese characters by using Token identification;
      if the character is judged to be Chinese, loading a hidden horse HMM probability model graph, obtaining word segmentation and label by using a Viterbi algorithm dynamic rule, and then outputting;
      if the judgment result is non-Chinese, identifying the combination of English, number and time forms, giving corresponding labels, and outputting.
    4. The method of claim 3, wherein before the building of the participle DAG word graph based on the Trie-tree participle model, the method further comprises the steps of:
      loading the log-in word dictionary;
      and establishing the Trie tree word segmentation model.
    5. The method of claim 3, wherein between the obtaining the input sentence and the building a participle DAG word graph based on the Trie-tree participle model, further comprising the steps of:
      cleaning the sentence and judging whether the sentence contains special characters or not;
      and if the special characters are judged to be contained, separating the special characters, identifying the special characters as unknown parts of speech and then outputting the unknown parts of speech.
    6. A deep learning-based document generation apparatus, the apparatus comprising:
      an acquisition unit configured to acquire a user input title;
      the word segmentation unit is used for performing word segmentation operation on the title by adopting a jieba Chinese word segmentation method;
      a keyword set obtaining unit, configured to perform keyword extraction on the title subjected to the word segmentation operation by using a TF-IDF algorithm to obtain a keyword set;
      the file acquisition unit is used for conveying the keyword set as input to a preset text generation algorithm model, and taking the obtained output as a file;
      and the storage unit is used for storing the jieba Chinese word segmentation method, the TF-IDF algorithm and the preset text generation algorithm model.
    7. The document creation apparatus of claim 6, wherein the apparatus further comprises:
      and the preset text generation algorithm model acquisition unit is used for acquiring the preset text generation algorithm model.
    8. The document generation apparatus according to claim 7, wherein the preset text generation algorithm model acquisition unit includes:
      the system comprises a recommended document acquisition unit, a recommendation document generation unit and a recommendation document generation unit, wherein the recommended document acquisition unit is used for acquiring a plurality of manually written recommended documents;
      the word segmentation unit of the recommended case is used for carrying out word segmentation operation on the recommended case by adopting a jieba Chinese word segmentation method;
      a recommended case keyword set acquisition unit, configured to perform keyword extraction on the recommended case subjected to word segmentation operation by using a TF-IDF algorithm to obtain a keyword set;
      and the training unit is used for taking the keyword set as input and the recommended file as output to train so as to obtain the text generation algorithm model.
    9. An electronic device, characterized in that the electronic device comprises:
      at least one processor; and the number of the first and second groups,
      a memory communicatively coupled to the at least one processor; wherein,
      the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 5.
    10. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the document generation method of any one of claims 1 to 5.
    Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| CN202010001994.4A CN111178056A (en) | 2020-01-02 | 2020-01-02 | Deep learning based file generation method and device and electronic equipment | 
| PCT/CN2020/111951 WO2021135319A1 (en) | 2020-01-02 | 2020-08-28 | Deep learning based text generation method and apparatus and electronic device | 
| CA3166742A CA3166742A1 (en) | 2020-01-02 | 2020-08-28 | Method of generating text plan based on deep learning, device and electronic equipment | 
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| CN202010001994.4A CN111178056A (en) | 2020-01-02 | 2020-01-02 | Deep learning based file generation method and device and electronic equipment | 
Publications (1)
| Publication Number | Publication Date | 
|---|---|
| CN111178056A true CN111178056A (en) | 2020-05-19 | 
Family
ID=70654435
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date | 
|---|---|---|---|
| CN202010001994.4A Pending CN111178056A (en) | 2020-01-02 | 2020-01-02 | Deep learning based file generation method and device and electronic equipment | 
Country Status (3)
| Country | Link | 
|---|---|
| CN (1) | CN111178056A (en) | 
| CA (1) | CA3166742A1 (en) | 
| WO (1) | WO2021135319A1 (en) | 
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| CN112446214A (en) * | 2020-12-09 | 2021-03-05 | 北京有竹居网络技术有限公司 | Method, device and equipment for generating advertisement keywords and storage medium | 
| WO2021135319A1 (en) * | 2020-01-02 | 2021-07-08 | 苏宁云计算有限公司 | Deep learning based text generation method and apparatus and electronic device | 
| CN113553838A (en) * | 2021-08-03 | 2021-10-26 | 稿定(厦门)科技有限公司 | Method and device for generating commodity copywriting | 
| CN113761174A (en) * | 2020-11-17 | 2021-12-07 | 北京京东尚科信息技术有限公司 | Text generation method and device | 
| CN114881024A (en) * | 2022-04-22 | 2022-08-09 | 中南大学 | Leaderless group discussion system | 
| CN115345669A (en) * | 2022-08-19 | 2022-11-15 | 广州欢聚时代信息科技有限公司 | Method and device for generating file, storage medium and computer equipment | 
Families Citing this family (5)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| CN114048304A (en) * | 2021-10-26 | 2022-02-15 | 盐城金堤科技有限公司 | Effective keyword determination method, device, storage medium and electronic device | 
| CN114742042B (en) * | 2022-03-22 | 2024-12-31 | 杭州未名信科科技有限公司 | A text deduplication method, device, electronic device and storage medium | 
| CN115913782A (en) * | 2022-12-29 | 2023-04-04 | 北京天融信网络安全技术有限公司 | Method, device, electronic equipment and medium for message filtering configuration | 
| CN116151232B (en) * | 2023-04-24 | 2023-08-29 | 北京龙智数科科技服务有限公司 | Method and device for generating model by multi-stage training text title | 
| CN117033618A (en) * | 2023-08-21 | 2023-11-10 | 携程旅游信息技术(上海)有限公司 | Automatic generation method, system, equipment and storage medium for product selling points | 
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| CN103902545A (en) * | 2012-12-25 | 2014-07-02 | 北京京东尚科信息技术有限公司 | Category path recognition method and system | 
| CN110309114A (en) * | 2018-02-28 | 2019-10-08 | 腾讯科技(深圳)有限公司 | Processing method, device, storage medium and the electronic device of media information | 
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| CN109992764B (en) * | 2017-12-29 | 2022-12-16 | 阿里巴巴集团控股有限公司 | A copy generation method and device | 
| CN111178056A (en) * | 2020-01-02 | 2020-05-19 | 苏宁云计算有限公司 | Deep learning based file generation method and device and electronic equipment | 
- 
        2020
        - 2020-01-02 CN CN202010001994.4A patent/CN111178056A/en active Pending
- 2020-08-28 WO PCT/CN2020/111951 patent/WO2021135319A1/en not_active Ceased
- 2020-08-28 CA CA3166742A patent/CA3166742A1/en active Pending
 
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| CN103902545A (en) * | 2012-12-25 | 2014-07-02 | 北京京东尚科信息技术有限公司 | Category path recognition method and system | 
| CN110309114A (en) * | 2018-02-28 | 2019-10-08 | 腾讯科技(深圳)有限公司 | Processing method, device, storage medium and the electronic device of media information | 
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| WO2021135319A1 (en) * | 2020-01-02 | 2021-07-08 | 苏宁云计算有限公司 | Deep learning based text generation method and apparatus and electronic device | 
| CN113761174A (en) * | 2020-11-17 | 2021-12-07 | 北京京东尚科信息技术有限公司 | Text generation method and device | 
| CN112446214A (en) * | 2020-12-09 | 2021-03-05 | 北京有竹居网络技术有限公司 | Method, device and equipment for generating advertisement keywords and storage medium | 
| CN112446214B (en) * | 2020-12-09 | 2024-02-02 | 北京有竹居网络技术有限公司 | Advertisement keyword generation method, device, equipment and storage medium | 
| CN113553838A (en) * | 2021-08-03 | 2021-10-26 | 稿定(厦门)科技有限公司 | Method and device for generating commodity copywriting | 
| CN114881024A (en) * | 2022-04-22 | 2022-08-09 | 中南大学 | Leaderless group discussion system | 
| CN115345669A (en) * | 2022-08-19 | 2022-11-15 | 广州欢聚时代信息科技有限公司 | Method and device for generating file, storage medium and computer equipment | 
Also Published As
| Publication number | Publication date | 
|---|---|
| WO2021135319A1 (en) | 2021-07-08 | 
| CA3166742A1 (en) | 2021-07-08 | 
Similar Documents
| Publication | Publication Date | Title | 
|---|---|---|
| CN111178056A (en) | Deep learning based file generation method and device and electronic equipment | |
| CN110287278B (en) | Comment generation method, comment generation device, server and storage medium | |
| CN109543058B (en) | Method, electronic device, and computer-readable medium for detecting image | |
| CN110969012B (en) | Text error correction method and device, storage medium and electronic equipment | |
| CN111381909B (en) | Page display method and device, terminal equipment and storage medium | |
| US9613268B2 (en) | Processing of images during assessment of suitability of books for conversion to audio format | |
| WO2022143105A1 (en) | Method and apparatus for generating text generation model, text generation method and apparatus, and device | |
| CN112434510B (en) | Information processing method, device, electronic equipment and storage medium | |
| CN110737774A (en) | Book knowledge graph construction method, book recommendation method, device, equipment and medium | |
| CN110267097A (en) | Video pushing method, device and electronic equipment based on characteristic of division | |
| CN111401044A (en) | Title generation method and device, terminal equipment and storage medium | |
| CN110278447B (en) | Video pushing method and device based on continuous features and electronic equipment | |
| US20250118338A1 (en) | Methods, devices, readable media and electronic devices for video processing | |
| CN110377778A (en) | Figure sort method, device and electronic equipment based on title figure correlation | |
| CN111767740A (en) | Sound effect adding method and device, storage medium and electronic device | |
| EP4528542A1 (en) | Song list display information generation method and apparatus, electronic device and storage medium | |
| CN111815274A (en) | Information processing method and device and electronic equipment | |
| CN110826619A (en) | Document classification method, device and electronic equipment for electronic file | |
| CN112951274A (en) | Voice similarity determination method and device, and program product | |
| CN111259676A (en) | Translation model training method and device, electronic equipment and storage medium | |
| CN111581381B (en) | Method and device for generating training set of text classification model and electronic equipment | |
| CN111859970B (en) | Method, apparatus, device and medium for processing information | |
| CN119088936A (en) | Prompt word optimization method, device, storage medium and program product | |
| CN114049591A (en) | Method, device, storage medium and electronic device for obtaining video material | |
| CN112446214A (en) | Method, device and equipment for generating advertisement keywords and storage medium | 
Legal Events
| Date | Code | Title | Description | 
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication | Application publication date: 20200519 | |
| RJ01 | Rejection of invention patent application after publication |