CN119783650A

CN119783650A - A method and system for generating multimodal reports based on large language model and FreeMarker

Info

Publication number: CN119783650A
Application number: CN202411722491.0A
Authority: CN
Inventors: 李志华; 田元浩; 范成城
Original assignee: Inspur Cloud Information Technology Co Ltd
Current assignee: Inspur Cloud Information Technology Co Ltd
Priority date: 2024-11-28
Filing date: 2024-11-28
Publication date: 2025-04-08

Abstract

The present invention discloses a method and system for generating multimodal reports based on a large language model and FreeMarker, belonging to the technical field of natural language processing, and the implementation of the method includes the following steps: step 1, professional data preprocessing; step 2, splicing the preprocessed data, prompting and using the large language model to perform data analysis and text generation; step 3, parsing the output results of the model to obtain corresponding analysis conclusions, data statistics, reference documents, and constructing various forms of content with the help of object storage; step 4, designing a report template in FreeMarker; step 5, filling the content generated by the large language model into the FreeMarker template. The present invention can solve the problem that the traditional large language model generates a single report form, and the presentation form and information hierarchy are poor, thereby improving the professional field recognition of the generated report and meeting the needs of various industries for high-quality and diversified reports.

Description

Method and system for generating multi-modal report based on large language model and FREEMARKER

Technical Field

The invention relates to the technical field of natural language processing, in particular to a method and a system for generating a multi-mode report based on a large language model and FREEMARKER.

Background

Today, large language models become a new technology place in the field of artificial intelligence and a tap technology in the field of digital economy, and play an important role in promoting the generation transformation for promoting the development of artificial intelligence and inducing the development of the next round of artificial intelligence.

Large language models (Large Language Model, LLM for short) are a class of deep learning-based artificial intelligence models that contain hundreds of billions (or more) of parameters and are trained with large amounts of text data. Not only natural language text, but also meaning of understanding language text, can be generated. And can also execute various natural language processing tasks such as translation, emotion analysis and the like. The method is widely applied to various application scenes such as text writing, question answering, dialogue and the like, and shows strong practicability and potential from automatic customer service to advanced research.

The same large language model also has some drawbacks and limitations in generating professional field reports:

1. the report outline is not fixed, and the randomness is strong, so that the large language model can obtain different content structures after analyzing and processing the same professional data due to lack of uniform structures and standards, thereby greatly compromising the standardization and the professionality of the report.

2. The report content display form and the information level are poor, namely, the output content of the large language model is often text analysis, and various content forms such as pictures, videos, tables, hyperlinks and the like are difficult to enrich, so that the report is difficult to clearly convey key information and data, and the overall quality and the readability of the report are affected.

FREEMARKER is a template engine technique, a generic tool that is used to generate output text containing various forms of content, such as pictures, videos, forms, hyperlinks, etc., based on templates and data to be changed. FREEMARKER, which is a powerful text generation tool, provides templated design functions, but has shortcomings in content intelligent analysis.

Therefore, there is a strong need for a new method that can combine the advantages of large language models and FREEMARKER to generate intelligent reports containing multi-modal content.

Disclosure of Invention

The technical task of the invention is to provide a method and a system for generating a multi-mode report based on a large language model and FREEMARKER aiming at the defects, which can solve the problems of single report generation form, poor display form and information level of the traditional large language model, thereby improving the professional field acceptance of the generated report and meeting the requirements of various industries on high-quality and diversified reports.

The technical scheme adopted for solving the technical problems is as follows:

A method of generating a multimodal report based on a large language model and FREEMARKER, the implementation of the method comprising the steps of:

Step 1, professional data preprocessing, which comprises collecting, cleaning and formatting data to be analyzed;

step 2, splicing the preprocessed data, and performing data analysis and text generation by using a large language model;

Step 3, analyzing the output result of the model to obtain corresponding analysis conclusion, data statistics, reference document and the like, and constructing various expression forms of the content including tables, pictures, videos, hyperlinks and the like by means of object storage;

Step 4, designing a report template in FREEMARKER, wherein the report template comprises a preset dynamic catalog, a table placeholder, a picture and video embedded area, a hyperlink position and the like;

And 5, filling the content generated by the large language model into FREEMARKER templates, formatting and typesetting, and outputting and displaying the report content.

The method utilizes a large language model to carry out data analysis and text generation, and simultaneously realizes the multi-mode content organization and display of the report by virtue of the FREEMARKER template function.

Further, the method for preprocessing professional data comprises the following steps:

Data preprocessing, namely cleaning, mid-culture and the like of original professional data to reduce understanding deviation of a large language model and convert a data set into a form which can be processed by a computer;

Keyword recognition, namely recognizing the key values/keywords in the data collection by utilizing a Natural Language Processing (NLP) technology, wherein the key values/keywords comprise related names, categories, degrees, areas and the like, and recognizing meanings corresponding to the data to form a data dictionary;

And converting the data set subjected to formatting into a computer-processable form, such as JSON and other formats, constructing a project, and inputting the project into a LLM model for analysis and processing.

Further, the FREEMARKER template design allows the user to customize the layout and style of the report to meet the requirements of different industries and application scenarios.

Further, the FREEMARKER template design further includes:

designing a template layout, including defining the size, margin, font style, size and the like of a page, and setting the title, subtitle and footer of a report;

Setting a dynamic catalog including links of chapter titles and subtitle so that a user can quickly jump to a corresponding report part by clicking a catalog item;

Configuration tables, pictures, videos, hyperlinks, etc. for multimodal content placeholders.

Further, the method for presetting the module dynamic catalogue comprises the following steps:

The dynamic catalog algorithm can automatically identify chapters and sub-chapters in the report, and a structured catalog is generated, so that a user can conveniently and quickly navigate to different parts of the report;

WIN32 implements updating the directory field for the generated report, ensuring that the directory page number is consistent with the dynamic report content page number.

Furthermore, the setting of the dynamic catalog, in FREEMARKER templates, uses specific FREEMARKER grammar to load the dynamic catalog by Servlet context loading or Spring Boot integration, and automatically generates the catalog according to the content in the report, wherein the catalog comprises links of chapter titles and sub-titles, so that a user can quickly jump to a corresponding report part by clicking catalog items;

Configuring table placeholders, namely reserving positions of tables in a report template, dynamically generating the tables by using a FREEMARKER list and a circulating instruction, and automatically filling table data according to a data analysis result generated by a large language model;

The method comprises the steps of embedding pictures and videos, reserving placeholders of the pictures and videos in a template, storing link addresses of related pictures and videos output by a model into an object, acquiring remote URL (uniform resource locator) of the object, and embedding the remote URL into a report through URL processing instructions of FREEMARKER;

Adding hyperlinks-in the text of the report, hyperlinks are added using the FREEMARKER link instructions.

Further, the text content generated by the large language model is filled according to the appointed position of FREEMARKER templates, the FREEMARKER template engine replaces the analysis template file with the actual content by placeholders, and a preset formatting rule is applied to generate a final report document;

the method for outputting and displaying the report content is as follows:

The object storage such as Minio is utilized to store pictures, videos, reference documents and the like in the report, so that the report has high expandability and reliability, and the remote quick access and efficient storage of the report are ensured.

The invention also claims a system for generating a multimodal report based on a large language model and FREEMARKER, comprising:

The professional data preprocessing module is used for collecting, cleaning and formatting data to be analyzed;

The large language model processing module is used for carrying out data analysis and text generation on the preprocessed data by splicing the template by using a large language model;

The content analysis module analyzes the output result of the model to obtain corresponding analysis conclusion, data statistics, reference document and the like, and constructs various expression forms of the content including tables, pictures, videos, hyperlinks and the like by means of object storage;

The FREEMARKER template custom module designs a report template in FREEMARKER, wherein the report template comprises a preset dynamic catalog, a table placeholder, a picture and video embedded area, a hyperlink position and the like;

The output and display module is used for filling the content generated by the large language model into a FREEMARKER template, formatting and typesetting the content, and outputting and displaying the report content;

the system realizes the generation of the multi-modal report by the method for generating the multi-modal report based on the large language model and FREEMARKER.

The invention also claims an apparatus for generating a multimodal report based on a large language model and FREEMARKER, comprising at least one memory and at least one processor;

the at least one memory for storing a machine readable program;

The at least one processor is configured to invoke the machine-readable program to implement the method described above.

The invention also claims a computer readable medium having stored thereon computer instructions which, when executed by a processor, cause the processor to perform the above-described method.

Compared with the prior art, the method and the system for generating the multi-mode report based on the large language model and FREEMARKER have the following beneficial effects:

The invention provides an innovative report generation method, which combines an advanced large language model and a flexible FREEMARKER template technology to realize the multi-modeling of report contents. Compared with the prior method for generating the report by means of the large language model, the method enables the generated report to not only contain text information, but also embed various modal contents such as dynamic catalogues, tables, pictures, videos and hyperlinks, and greatly enriches the display form and information level of the report. Compared with manual writing, the working time is greatly shortened. The invention realizes the intelligent generation and the customized display of the report content through the dynamic analysis capability of the large language model and the fixed template structure of FREEMARKER, and is suitable for various industries and application scenes.

Drawings

FIG. 1 is a flow chart of a method for generating a multimodal report based on a large language model and FREEMARKER provided by an embodiment of the present invention;

FIG. 2 is a flow diagram of a build FREEMARKER stencil provided by an embodiment of the present invention.

Detailed Description

The invention will be further described with reference to the drawings and the specific examples.

The embodiment of the invention provides a method for generating a multi-modal report based on a large language model and FREEMARKER, which comprises the following steps:

Analyzing the output result of the model to obtain corresponding analysis conclusion, data statistics, reference document and the like, and constructing various expression forms of the content including tables, pictures, videos, hyperlinks and the like by means of object storage;

The report template is designed in FREEMARKER, and FREEMARKER template design allows users to customize the layout and style of the report to meet the requirements of different industries and application scenarios. The FREEMARKER template design further includes:

The report content is output and displayed, and object storage such as Minio is utilized to store pictures, videos, reference documents and the like in the report, so that the report has high expandability and reliability, and the remote quick access and high-efficiency storage of the report are ensured.

The method utilizes a large language model to carry out data analysis and text generation, and simultaneously realizes the multi-mode content organization and display of the report by virtue of the FREEMARKER template function. The method is described in further detail below with reference to fig. 1-2.

As shown in FIG. 1, a report generation flow chart for generating a multimodal report based on a large language model and FREEMARKER is shown:

s1, collecting professional data to be analyzed, formatting the data, and ensuring that the data is suitable for the input requirement of a large language model.

S2, constructing a prompt project, and carrying out data analysis and text generation on the preprocessed data by using a large language model.

And S3, analyzing the output result of the model to obtain corresponding analysis conclusion, data statistics, reference document and the like, and constructing various expression forms of the content including tables, pictures, videos, hyperlinks and the like by means of object storage.

S4, a user self-defines a outline template of the report, FREEMARKER a template engine analyzes and generates the report template, presets a dynamic catalog, a table placeholder, a picture and video embedded area, a hyperlink position and the like, and ensures reasonable logic structure and layout of the content.

And S5, filling the content generated by the large language model according to the structure of the FREEMARKER template, automatically generating report content containing multiple modes, and displaying the finally generated multi-mode report.

The method for formatting professional data comprises the following steps:

And data preprocessing, namely cleaning, mid-culture and the like of the original professional data to reduce understanding deviation of a large language model, and converting the data set into a computer-processable form.

Keyword recognition, namely recognizing the key values/keywords in the data collection by utilizing a Natural Language Processing (NLP) technology, wherein the key values/keywords are related to names, categories, degrees, areas and the like, and recognizing the meanings corresponding to the data to form a data dictionary.

Prompt is a technique based on Artificial Intelligence (AI) instructions by explicitly and specifically directing the output of a language model. In Prompt word engineering, the definition of Prompt encompasses three main elements of task, instruction and role to ensure that the model generates text that meets the needs of the user. For example, the road disease professional data is taken as an example, and the following Prompt is spliced, wherein the corresponding relation of the data dictionary is that the name of the data field is the Chinese meaning of the data field, the following content is needed to be analyzed according to the following data, the following content is described by a section of natural language (task) +data set+what all disease types are in the whole road section in the data, what the total sum of all disease areas is, and what the typical structural disease type is (instruction) "is used for standardizing the output of a large language model, so that the text with correlation, accuracy and high quality is generated.

The large language model described above may be a model of public cloud deployment. Such as the caretaker, GPT series, or a professional model of the local deployment training.

As a detailed embodiment of constructing FREEMARKER templates, the steps are as shown in fig. 2:

s41, designing FREEMARKER template layout, namely designing the layout and structure of the report by using the FREEMARKER template engine according to the requirements of the report. This includes defining the page size, margins, font style and size, etc., and setting the headings, subheadings, and footers of the report.

S42, setting the dynamic catalogue, namely loading the dynamic catalogue in FREEMARKER templates by using specific FREEMARKER grammar such as Servlet context loading or Spring Boot integration and the like. The catalog will be automatically generated based on the content in the report, including links to chapter titles and sub-titles, enabling the user to quickly jump to the corresponding report section by clicking on the catalog entry.

S43, configuring a table placeholder, namely reserving the position of a table in a report template, and dynamically generating the table by using the list of FREEMARKER and a circulating instruction. And automatically filling table data, such as statistical information of disease types, areas, frequencies and the like, according to data analysis results generated by the large language model.

S44, embedding the pictures and the videos, namely reserving placeholders of the pictures and the videos in the template. And storing the link addresses of the related pictures and videos output by the model into an object for storage, acquiring remote URL thereof, and embedding the remote URL into a report through URL processing instructions of FREEMARKER. The media content may be live photographs of disease, video recordings, or other related visual material.

S45, adding hyperlinks, namely adding hyperlinks in the text of the report by using FREEMARKER link instructions. These hyperlinks may provide the user with more information and context for some of the generated content reference documents provided by the large model, or for externally directed reference documents, related studies, or further resources.

As an optimized implementation mode, after report contents are generated, page numbers of the dynamic catalogue are automatically updated by means of the WIN32, and algorithm codes are as follows:

doc=word.Documents.Open(file_path)

doc.Fields.Update()

And filling the text content generated by the large language model according to the appointed position of the FREEMARKER template. And FREEMARKER, the template engine analyzes the template file, replaces placeholders with actual contents, and applies preset formatting rules to generate a final report document.

The method realizes the following steps:

and the multi-mode fusion is realized by combining the text generation capacity of the large language model with the template design function of FREEMARKER for the first time.

Dynamic content generation, namely dynamically generating report content by utilizing the data analysis capability of a large language model, thereby greatly reducing manual intervention, reducing subjectivity and saving labor cost.

The embodiment of the invention also provides a system for generating the multi-modal report based on the large language model and FREEMARKER, which realizes the generation of the multi-modal report by the method for generating the multi-modal report based on the large language model and FREEMARKER.

The system comprises:

And the output and display module is used for filling the content generated by the large language model into the FREEMARKER template, formatting and typesetting the content, and outputting and displaying the report content.

The professional data preprocessing module is used for preprocessing the professional data, and the method for preprocessing the professional data comprises the following steps:

The FREEMARKER template customization module allows a user to customize the layout and style of the report so as to meet the requirements of different industries and application scenes. FREEMARKER the template design further includes:

The setting of the dynamic catalogue, in FREEMARKER templates, uses specific FREEMARKER grammar to load Servlet context or Spring Boot integration, loads the dynamic catalogue, automatically generates the catalogue according to the content in the report, and comprises links of chapter titles and sub-titles, so that a user can quickly jump to a corresponding report part by clicking a catalogue item;

The FREEMARKER template engine replaces the analysis template file with the actual content and applies the preset formatting rule to generate the final report document;

The output and display module utilizes object storage such as Minio to store pictures, videos, reference documents and the like in the report, has high expandability and reliability, and ensures remote quick access and efficient storage of the report.

The embodiment of the invention also provides a device for generating the multi-modal report based on the large language model and FREEMARKER, which comprises at least one memory and at least one processor;

the at least one memory for storing a machine readable program;

the at least one processor is configured to invoke the machine-readable program to implement the method for generating a multimodal report based on the large language model and FREEMARKER as described in the above embodiments.

Embodiments of the present invention also provide a computer readable medium having stored thereon computer instructions that, when executed by a processor, cause the processor to perform the method of generating a multimodal report based on a large language model and FREEMARKER as described in the above embodiments. Specifically, a system or apparatus provided with a storage medium on which a software program code realizing the functions of any of the above embodiments is stored, and a computer (or CPU or MPU) of the system or apparatus may be caused to read out and execute the program code stored in the storage medium.

In this case, the program code itself read from the storage medium may realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code form part of the present invention.

Examples of storage media for providing program code include floppy disks, hard disks, magneto-optical disks, optical disks (e.g., CD-ROMs, CD-R, CD-RWs, DVD-ROMs, DVD-RAMs, DVD-RWs, DVD+RWs), magnetic tapes, nonvolatile memory cards, and ROMs. Alternatively, the program code may be downloaded from a server computer by a communication network.

Further, it should be apparent that the functions of any of the above-described embodiments may be implemented not only by executing the program code read out by the computer, but also by causing an operating system or the like operating on the computer to perform part or all of the actual operations based on the instructions of the program code.

Further, it is understood that the program code read out by the storage medium is written into a memory provided in an expansion board inserted into a computer or into a memory provided in an expansion unit connected to the computer, and then a CPU or the like mounted on the expansion board or the expansion unit is caused to perform part and all of actual operations based on instructions of the program code, thereby realizing the functions of any of the above embodiments.

While the invention has been illustrated and described in detail in the drawings and in the preferred embodiments, the invention is not limited to the disclosed embodiments, and it will be appreciated by those skilled in the art that the code audits of the various embodiments described above may be combined to produce further embodiments of the invention, which are also within the scope of the invention.

Claims

1. A method for generating a multimodal report based on a large language model and FreeMarker, characterized in that the implementation of the method includes the following steps:

Step 1: Professional data preprocessing, including collecting, cleaning, and formatting the data to be analyzed;

Step 2: The preprocessed data is concatenated with prompt and the large language model is used for data analysis and text generation;

Step 3: Analyze the output of the model to obtain the corresponding analysis conclusions, data statistics, and reference documents, and use object storage to construct various forms of content, including tables, pictures, videos, and hyperlinks;

Step 4, design the report template in FreeMarker, including preset dynamic directory, table placeholder, image and video embedding area and hyperlink location;

Step 5: Fill the content generated by the large language model into the FreeMarker template, format it, and output and display the report content.

2. According to the method for generating a multimodal report based on a large language model and FreeMarker according to claim 1, it is characterized in that the method for preprocessing professional data includes:

Data preprocessing: Clean and localize the original professional data to reduce the understanding bias of the large language model and convert the data collection into a form that can be processed by computers;

Keyword identification: Use natural language processing technology to identify key values/keywords in the data collection, including name, category, degree, area correlation, identify the corresponding meaning of the data, and form a data dictionary;

Data conversion: Convert the formatted data collection into a form that can be processed by a computer, construct a prompt project and input it into the LLM model for analysis and processing.

3. According to a method for generating multimodal reports based on a large language model and FreeMarker according to claim 1, it is characterized in that the FreeMarker template design allows users to customize the layout and style of the report to meet the needs of different industries and application scenarios.

4. According to a method for generating a multimodal report based on a large language model and FreeMarker according to claim 1 or 3, it is characterized in that the FreeMarker template design further comprises:

Design template layouts, including defining page size, margins, font style and size, and setting report titles, subtitles, and footers;

Set up a dynamic table of contents, including links to chapter titles and subtitles, so that users can quickly jump to the corresponding report section by clicking on a table of contents item;

Configure table, image, video, hyperlink multimodal content placeholder.

5. According to the method for generating multimodal reports based on a large language model and FreeMarker according to claim 4, it is characterized in that the method of presetting the module dynamic directory is as follows:

Dynamic table of contents algorithm can automatically identify chapters and sub-chapter in the report and generate a structured table of contents, which makes it easy for users to quickly navigate to different parts of the report;

WIN32 implements updating the directory field of the generated report to ensure that the directory page number is consistent with the dynamic report content page number.

6. A method for generating a multimodal report based on a large language model and FreeMarker according to claim 5, characterized in that the dynamic directory is set: in the FreeMarker template, a specific FreeMarker syntax including Servlet context loading or Spring Boot integration is used to load the dynamic directory; the directory will be automatically generated according to the content in the report, including links to section titles and subtitles, so that users can quickly jump to the corresponding report section by clicking on the directory item;

Configure table placeholders: reserve a place for a table in the report template, use FreeMarker's list and loop instructions to dynamically generate a table; automatically fill in table data based on the data analysis results generated by the large language model;

Embed images and videos: reserve placeholders for images and videos in the template; store the link addresses of the relevant images and videos output by the model in the object storage, obtain their remote URLs, and embed them into the report through FreeMarker's URL processing instructions;

Add hyperlinks: In the report text, use FreeMarker's link directive to add hyperlinks.

7. According to claim 1, a method for generating a multimodal report based on a large language model and FreeMarker is characterized in that the text content generated by the large language model is filled in according to the specified position of the FreeMarker template; the FreeMarker template engine parses the template file, replaces the placeholder with the actual content, and applies the preset formatting rules to generate the final report document;

The method to output and display the report content is as follows:

Object storage is used to store images, videos, and reference documents in reports, which is highly scalable and reliable, ensuring remote and fast access and efficient storage of reports.

8. A system for generating multimodal reports based on a large language model and FreeMarker, characterized by comprising:

Professional data preprocessing module, used to collect, clean and format the data to be analyzed;

The large language model processing module combines the preprocessed data with prompts to perform data analysis and text generation using the large language model.

Content parsing module: The output of the parsing model obtains the corresponding analysis conclusions, data statistics, and reference documents. With the help of object storage, it constructs various forms of content, including tables, pictures, videos, and hyperlinks.

FreeMarker template customization module, design report templates in FreeMarker, including preset dynamic directories, table placeholders, image and video embedding areas, and hyperlink locations;

The output and display module fills the content generated by the large language model into the FreeMarker template, formats it, and outputs and displays the report content;

The system realizes multimodal report generation through the method of generating multimodal reports based on a large language model and FreeMarker as described in any one of claims 1 to 7.

9. A device for generating multimodal reports based on a large language model and FreeMarker, characterized by comprising: at least one memory and at least one processor;

The at least one memory is used to store a machine-readable program;

The at least one processor is used to call the machine-readable program to implement the method described in any one of claims 1 to 7.

10. A computer-readable medium, characterized in that computer instructions are stored on the computer-readable medium, and when the computer instructions are executed by a processor, the processor executes any one of the methods of claims 1 to 7.