CN112818274B

CN112818274B - Method for converting PDF file into paging HTML file and computer equipment

Info

Publication number: CN112818274B
Application number: CN202110163273.8A
Authority: CN
Inventors: 方昆
Original assignee: Shenzhen Sekorm Component Network Co Ltd
Current assignee: Shenzhen Sekorm Component Network Co Ltd
Priority date: 2021-02-05
Filing date: 2021-02-05
Publication date: 2024-03-19
Anticipated expiration: 2041-02-05
Also published as: CN112818274A

Abstract

The invention relates to a method for converting a PDF file into a paging HTML file and computer equipment. The method comprises the following steps: s1, receiving a PDF file, and converting the PDF file into a single HTML file and a plurality of font files, wherein each font file contains one type of fonts in the PDF file; s2, separating a CSS file, a JavaScript file and a plurality of HTML subfiles after analyzing the HTML file, wherein each page of the PDF file corresponds to one HTML subfile; combining the plurality of font files into a single text page file; s3, storing the CSS file, the JavaScript file, the plurality of HTML subfiles and the text page file. After the PDF file is converted into the paging HTML file, the browser is not required to install the PDF plug-in, only one page of content is loaded at a time, the loading speed is high, and the use flow is low.

Description

Method and computer equipment for converting PDF files into paginated HTML files

技术领域Technical field

本发明涉及PDF网页显示领域，更具体地说，涉及一种PDF文件转换为分页HTML文件的方法及计算机设备。The present invention relates to the field of PDF web page display, and more specifically, to a method and computer equipment for converting a PDF file into a paged HTML file.

背景技术Background technique

一些网站资料已PDF(Portable Document Format)文件形式存在，用户使用浏览器访问PDF文件。现有技术中浏览器主要采用两种方式加载PDF文件：Some website materials exist in the form of PDF (Portable Document Format) files, and users use a browser to access PDF files. In the existing technology, browsers mainly use two methods to load PDF files:

一种方式是使用PDF插件，直接下载PDF文件。这种方式需要浏览器必须下载PDF插件，并需要下载整个PDF文件。如果PDF文件过大，将导致消耗流量过多，加载时间过长问题。One way is to use a PDF plug-in to download PDF files directly. This method requires the browser to download the PDF plug-in and download the entire PDF file. If the PDF file is too large, it will consume too much data and take too long to load.

另一种方式是服务器将PDF文件转化为HTML文件，浏览器加载HTML文件。这种方式下需要加载整个PDF文件，如果PDF文件过大，将导致消耗流量过多，加载时间过长问题。Another way is that the server converts the PDF file into an HTML file, and the browser loads the HTML file. In this method, the entire PDF file needs to be loaded. If the PDF file is too large, it will consume too much traffic and take too long to load.

发明内容Contents of the invention

本发明要解决的技术问题在于，针对现有技术的上述缺陷，提供一种PDF文件转换为分页HTML文件的方法及计算机设备。The technical problem to be solved by the present invention is to provide a method and computer equipment for converting a PDF file into a paged HTML file in view of the above-mentioned defects of the prior art.

本发明解决其技术问题所采用的技术方案是：构造一种PDF文件转换为分页HTML文件的方法，包括：The technical solution adopted by the present invention to solve the technical problem is to construct a method for converting a PDF file into a paged HTML file, including:

S1、接收PDF文件，将所述PDF文件转换为单个HTML文件和多个字体文件，每个所述字体文件包含所述PDF文件中一类字体；S1. Receive a PDF file, and convert the PDF file into a single HTML file and multiple font files, each of the font files containing a type of font in the PDF file;

S2、解析所述HTML文件后分离出CSS文件、JavaScript文件以及多个HTML子文件，所述PDF文件的每一页对应一个所述HTML子文件；将多个所述字体文件合并为单个文字页面文件；S2. After parsing the HTML file, separate CSS files, JavaScript files and multiple HTML sub-files. Each page of the PDF file corresponds to one HTML sub-file; merge multiple font files into a single text page. document;

S3、存储所述CSS文件、所述JavaScript文件、多个所述HTML子文件和所述文字页面文件。S3. Store the CSS file, the JavaScript file, a plurality of the HTML sub-files and the text page file.

进一步，在本发明所述的PDF文件转换为分页HTML文件的方法中，所述步骤S3包括：Further, in the method of converting a PDF file into a paged HTML file according to the present invention, the step S3 includes:

按照所述PDF文件对应的文件编号同一命名所述CSS文件、所述JavaScript文件、多个所述HTML子文件和所述文字页面文件，且每个所述HTML子文件的命名中包含对应的页码信息，将命名后的所述CSS文件、所述JavaScript文件、多个所述HTML子文件和所述文字页面文件存储在同一文件夹中，所述文件夹以所述文件编号命名。The CSS file, the JavaScript file, a plurality of the HTML sub-files and the text page file are named according to the file number corresponding to the PDF file, and the name of each HTML sub-file includes a corresponding page number. Information, store the named CSS file, the JavaScript file, a plurality of the HTML sub-files and the text page file in the same folder, and the folder is named with the file number.

进一步，在本发明所述的PDF文件转换为分页HTML文件的方法中，在所述步骤S3之后还包括：Further, in the method of converting a PDF file into a paged HTML file according to the present invention, after the step S3, it also includes:

S4、服务器接收PDF文件访问请求，查找与所述PDF文件访问请求对应的所述CSS文件、所述JavaScript文件、所述文字页面文件和其中一个所述HTML子文件，将查找所得文件下发至浏览器；S4. The server receives the PDF file access request, searches for the CSS file, the JavaScript file, the text page file and one of the HTML sub-files corresponding to the PDF file access request, and sends the searched file to browser;

S5、所述浏览器加载所述CSS文件、所述JavaScript文件、所述文字页面文件和所述HTML子文件，所述HTML子文件显示所述PDF文件的一页内容。S5. The browser loads the CSS file, the JavaScript file, the text page file and the HTML sub-file, and the HTML sub-file displays the content of one page of the PDF file.

进一步，在本发明所述的PDF文件转换为分页HTML文件的方法中，所述HTML子文件为所述PDF文件第一页内容对应的所述HTML子文件。Further, in the method of converting a PDF file into a paged HTML file according to the present invention, the HTML sub-file is the HTML sub-file corresponding to the content of the first page of the PDF file.

进一步，在本发明所述的PDF文件转换为分页HTML文件的方法中，在所述步骤S5之后还包括：Further, in the method of converting a PDF file into a paged HTML file according to the present invention, after the step S5, it also includes:

S6、所述服务器接收页面继续访问指令，查找与所述页面继续访问指令对应的所述HTML子文件，并下发至所述浏览器；S6. The server receives the page continuation access instruction, searches for the HTML sub-file corresponding to the page continuation access instruction, and sends it to the browser;

S7、所述浏览器接收并显示所述HTML子文件对应的所述PDF文件的一页内容。S7. The browser receives and displays the content of one page of the PDF file corresponding to the HTML sub-file.

进一步，在本发明所述的PDF文件转换为分页HTML文件的方法中，所述浏览器在显示页面中的文字时，将所述文字页面文件中的文字格式转换为网页文字格式显示。Further, in the method of converting a PDF file into a paged HTML file according to the present invention, when displaying the text in the page, the browser converts the text format in the text page file into a web page text format for display.

进一步，在本发明所述的PDF文件转换为分页HTML文件的方法中，若所述浏览器在显示页面中的文字时出现乱码，则按照所述文字页面文件中的文字格式重新渲染。Furthermore, in the method of converting a PDF file into a paging HTML file according to the present invention, if the browser displays garbled characters when displaying the text in the page, it will be re-rendered according to the text format in the text page file.

进一步，在本发明所述的PDF文件转换为分页HTML文件的方法中，所述浏览器在显示页面中的文字时，加载所述文字页面文件中对应页码的内容。Further, in the method of converting a PDF file into a paged HTML file according to the present invention, when the browser displays the text in the page, it loads the content corresponding to the page number in the text page file.

进一步，在本发明所述的PDF文件转换为分页HTML文件的方法中，在所述步骤S6中，在所述查找与所述页面继续访问指令对应的所述HTML子文件之前还包括：Further, in the method of converting a PDF file into a paging HTML file according to the present invention, in step S6, before searching for the HTML sub-file corresponding to the page continued access instruction, it also includes:

判断当前访问用户是否具有继续阅读权限；Determine whether the current access user has permission to continue reading;

若是，则查找与所述页面继续访问指令对应的所述HTML子文件。If so, search for the HTML sub-file corresponding to the instruction to continue accessing the page.

另外，本发明还提供一种计算机设备，包括存储器和处理器；In addition, the present invention also provides a computer device, including a memory and a processor;

所述存储器用于存储计算机程序；The memory is used to store computer programs;

所述处理器用于执行所述存储器中存储的计算机程序以实现如上述的PDF文件转换为分页HTML文件的方法。The processor is configured to execute a computer program stored in the memory to implement the above-mentioned method of converting a PDF file into a paged HTML file.

实施本发明的一种PDF文件转换为分页HTML文件的方法及计算机设备，具有以下有益效果：本发明将PDF文件转换为分页HTML文件后，不需要浏览器安装PDF插件，且每次仅加载一页内容，加载速度快，使用流量少。Implementing a method and computer equipment for converting a PDF file into a paged HTML file of the present invention has the following beneficial effects: after the present invention converts a PDF file into a paged HTML file, the browser does not need to install a PDF plug-in, and only one page is loaded each time. page content, fast loading speed, and low traffic usage.

附图说明Description of the drawings

下面将结合附图及实施例对本发明作进一步说明，附图中：The present invention will be further described below in conjunction with the accompanying drawings and examples. In the accompanying drawings:

图1是一实施例提供的一种PDF文件转换为分页HTML文件的方法的流程图；Figure 1 is a flow chart of a method for converting a PDF file into a paged HTML file according to an embodiment;

图2是一实施例提供的一种PDF文件转换为分页HTML文件的方法的流程图；Figure 2 is a flow chart of a method for converting a PDF file into a paged HTML file according to an embodiment;

图3是一实施例提供的一种PDF文件转换为分页HTML文件的方法的流程图；Figure 3 is a flow chart of a method for converting a PDF file into a paged HTML file according to an embodiment;

图4是一实施例提供的一种PDF文件转换为分页HTML文件的方法的流程图。FIG. 4 is a flow chart of a method for converting a PDF file into a paged HTML file according to an embodiment.

具体实施方式Detailed ways

为了对本发明的技术特征、目的和效果有更加清楚的理解，现对照附图详细说明本发明的具体实施方式。In order to have a clearer understanding of the technical features, purposes and effects of the present invention, the specific embodiments of the present invention will now be described in detail with reference to the accompanying drawings.

在一优选实施例中，参考图1，本实施例的PDF文件转换为分页HTML文件的方法包括下述步骤：In a preferred embodiment, referring to Figure 1, the method of converting a PDF file into a paged HTML file in this embodiment includes the following steps:

S1、接收PDF文件，将PDF文件转换为单个HTML文件和多个字体文件，每个字体文件包含PDF文件中一类字体。具体的，服务器接收PDF文件，使用预设命令对PDF文件进行转换，将PDF文件转换为单个HTML文件和多个字体文件，该转换过程所使用的预设命令可参考现有技术。因PDF文件一般包含多种字体，转换所得的每个字体文件包含PDF文件中一类字体，即PDF文件中每种字体集中在一个文字文件中。S1. Receive PDF files and convert the PDF files into a single HTML file and multiple font files. Each font file contains a type of font in the PDF file. Specifically, the server receives the PDF file, uses preset commands to convert the PDF file, and converts the PDF file into a single HTML file and multiple font files. The preset commands used in the conversion process can refer to the existing technology. Since PDF files generally contain multiple fonts, each converted font file contains a type of font in the PDF file, that is, each font in the PDF file is concentrated in one text file.

S2、解析HTML文件后分离出CSS文件、JavaScript文件以及多个HTML子文件，PDF文件的每一页对应一个HTML子文件；将多个字体文件合并为单个文字页面文件。具体的，由PDF文件产生的HTML文件包含用于网页加载的CSS内容和JavaScript内容，解析HTML文件后从中分离出CSS内容和JavaScript内容，将CSS内容设置为单独的CSS文件，将JavaScript内容设置为单独的JavaScript文件。可以理解，由PDF文件产生的HTML文件还包含PDF文件所有页码对应内容，按照HTML文件中页码信息对HTML文件进行分页解析，得到多个HTML子文件，PDF文件的每一页对应一个HTML子文件；也就是说，PDF文件的每一页对应一个HTML子文件。S2. After parsing the HTML file, separate CSS files, JavaScript files and multiple HTML sub-files. Each page of the PDF file corresponds to an HTML sub-file; merge multiple font files into a single text page file. Specifically, the HTML file generated from the PDF file contains CSS content and JavaScript content for web page loading. After parsing the HTML file, the CSS content and JavaScript content are separated from it, the CSS content is set as a separate CSS file, and the JavaScript content is set as Separate JavaScript files. It can be understood that the HTML file generated from the PDF file also contains the content corresponding to all page numbers of the PDF file. The HTML file is parsed according to the page number information in the HTML file to obtain multiple HTML sub-files. Each page of the PDF file corresponds to one HTML sub-file. ; In other words, each page of the PDF file corresponds to an HTML sub-file.

S3、存储CSS文件、JavaScript文件、多个HTML子文件和文字页面文件。具体的，按照PDF文件对应的文件编号同一命名CSS文件、JavaScript文件、多个HTML子文件和文字页面文件，且每个HTML子文件的命名中包含对应的页码信息，将命名后的CSS文件、JavaScript文件、多个HTML子文件和文字页面文件存储在同一文件夹中，文件夹以文件编号命名。建立文件编号和文件夹以及CSS文件、JavaScript文件、多个HTML子文件和文字页面文件的对应关系，在后续请求文件过程中可根据文件编号快速查找对应文件，提高响应速度。S3, stores CSS files, JavaScript files, multiple HTML sub-files and text page files. Specifically, the CSS file, JavaScript file, multiple HTML sub-files and text page files are named according to the file number corresponding to the PDF file, and the name of each HTML sub-file contains the corresponding page number information. The named CSS file, JavaScript files, multiple HTML sub-files and text page files are stored in the same folder, and the folder is named after the file number. Establish the corresponding relationship between file numbers and folders, CSS files, JavaScript files, multiple HTML sub-files and text page files. In the subsequent file request process, the corresponding files can be quickly searched based on the file number to improve response speed.

服务器接收到多个PDF文件时，按照上述步骤S1至步骤S3逐个进行转转换并存储，以用于浏览器进行访问。本实施例将PDF文件转换为分页HTML文件后，不需要浏览器安装PDF插件，且每次仅加载一页内容，加载速度快，使用流量少。When the server receives multiple PDF files, it converts and stores them one by one according to the above steps S1 to S3 for browser access. After converting the PDF file into a paginated HTML file in this embodiment, the browser does not need to install a PDF plug-in, and only one page of content is loaded at a time, which results in fast loading and low traffic usage.

在一优选实施例中，参考图2，本实施例的PDF文件转换为分页HTML文件方法，在步骤S3之后还包括步骤：In a preferred embodiment, referring to Figure 2, the method of converting a PDF file into a paging HTML file in this embodiment also includes the steps after step S3:

S4、服务器接收PDF文件访问请求，查找与PDF文件访问请求对应的CSS文件、JavaScript文件、文字页面文件和其中一个HTML子文件，将查找所得文件下发至浏览器。具体的，用户终端上安装的浏览器发送PDF文件访问请求至服务器，服务器接收PDF文件访问请求后，查找与PDF文件访问请求对应的CSS文件、JavaScript文件、文字页面文件和其中一个HTML子文件，将查找所得文件下发至浏览器。其中，PDF文件访问请求对应其中一个HTML子文件是指PDF文件访问请求中包含页码信息，可以是任意一页面，例如第1页、第3页、第5页等，该页码信息对应一个HTML子文件。作为选择，若PDF文件访问请求未包含页码信息，则HTML子文件默认为PDF文件第一页内容对应的HTML子文件，即将PDF文件第一页对应的HTML子文件下发至浏览器。S4. The server receives the PDF file access request, searches for the CSS file, JavaScript file, text page file and one of the HTML sub-files corresponding to the PDF file access request, and sends the searched file to the browser. Specifically, the browser installed on the user terminal sends a PDF file access request to the server. After receiving the PDF file access request, the server searches for the CSS file, JavaScript file, text page file and one of the HTML sub-files corresponding to the PDF file access request. Send the found files to the browser. Among them, the PDF file access request corresponding to one of the HTML sub-files means that the PDF file access request contains page number information, which can be any page, such as page 1, page 3, page 5, etc., and the page number information corresponds to an HTML sub-file. document. As an option, if the PDF file access request does not include page number information, the HTML subfile defaults to the HTML subfile corresponding to the content of the first page of the PDF file, that is, the HTML subfile corresponding to the first page of the PDF file is delivered to the browser.

S5、浏览器加载CSS文件、JavaScript文件、文字页面文件和HTML子文件，HTML子文件显示PDF文件的一页内容。具体的，浏览器接收CSS文件、JavaScript文件、文字页面文件和HTML子文件后，加载CSS文件、JavaScript文件、文字页面文件和HTML子文件，HTML子文件显示PDF文件的一页内容。可以理解的，因每个PDF文件仅对应一个CSS文件和一个JavaScript文件，在初次加载时加载CSS文件和JavaScript文件，建立网页基本参数和环境，后续再显示该PDF文件的其他页码时，便不再需要加载CSS文件和JavaScript文件。另外，文字页面文件由PDF对应的多个字体文件合并而成，因此其包含了PDF文件所有内容信息，需要注意的是，本实施例并非一次性加载显示所有页面内容，而是在加载文字页面文件中该HTML子文件对应的内容。例如，HTML子文件与PDF文件的第1页对应，则文字页面文件仅加载显示第1页内容，其他页面的内容虽然已下载到浏览器，但并不进行加载和显示，即用户是看不到的。S5. The browser loads CSS files, JavaScript files, text page files and HTML sub-files. The HTML sub-file displays the content of one page of the PDF file. Specifically, after the browser receives the CSS file, JavaScript file, text page file and HTML sub-file, it loads the CSS file, JavaScript file, text page file and HTML sub-file, and the HTML sub-file displays the content of one page of the PDF file. It is understandable that since each PDF file only corresponds to a CSS file and a JavaScript file, the CSS file and JavaScript file are loaded during the initial load to establish the basic parameters and environment of the web page. When subsequent display of other page numbers of the PDF file, it will not work. Then you need to load CSS files and JavaScript files. In addition, the text page file is composed of multiple font files corresponding to the PDF, so it contains all the content information of the PDF file. It should be noted that this embodiment does not load and display all the page content at once, but loads the text page. The content corresponding to the HTML sub-file in the file. For example, if the HTML sub-file corresponds to the first page of the PDF file, then the text page file will only load and display the content of the first page. Although the content of other pages has been downloaded to the browser, it will not be loaded and displayed, that is, the user cannot see it. Arrived.

本实施例的浏览器可请求查看某一页PDF文件，服务器仅需加载一页PDF文件对应的文件即可，加载速度快，使用流量少，提高用户体验，也降低服务器的数据处理压力。The browser in this embodiment can request to view a certain page of PDF files, and the server only needs to load the file corresponding to one page of PDF files. The loading speed is fast and the usage of traffic is small, which improves the user experience and reduces the data processing pressure of the server.

在一优选实施例中，参考图3，本实施例的PDF文件转换为分页HTML文件的方法，在步骤S5之后还包括步骤：In a preferred embodiment, referring to Figure 3, the method of converting a PDF file into a paged HTML file in this embodiment further includes the steps after step S5:

S6、服务器接收页面继续访问指令，查找与页面继续访问指令对应的HTML子文件，并下发至浏览器。具体的，用户查看当前页面后，可通过虚拟按键、鼠标、键盘、触摸屏等方式产生页面继续访问指令，浏览器发送页面继续访问指令至服务器。服务器接收页面继续访问指令后，查找与页面继续访问指令对应的HTML子文件，并下发至浏览器。可以理解的，查找页面继续访问指令时，不再需要查找CSS文件、JavaScript文件和文字页面文件，而仅需要查找与页面继续访问指令对应的HTML子文件即可，从而实现快速加载。作为选择，页面继续访问指令可以为上一页访问指令，也可以是下一页访问指令等。S6. The server receives the page continue access instruction, searches for the HTML subfile corresponding to the page continue access instruction, and sends it to the browser. Specifically, after viewing the current page, the user can generate a page continuation access instruction through virtual keys, mouse, keyboard, touch screen, etc., and the browser sends the page continuation access instruction to the server. After receiving the page continue access instruction, the server searches for the HTML sub-file corresponding to the page continue access instruction and sends it to the browser. It is understandable that when searching for the page continue access instruction, you no longer need to search for CSS files, JavaScript files and text page files, but only need to search for the HTML sub-file corresponding to the page continue access instruction, thereby achieving fast loading. Alternatively, the page continuation access instruction may be a previous page access instruction, a next page access instruction, etc.

S7、浏览器接收并显示HTML子文件对应的PDF文件的一页内容。具体的，浏览器接收HTML子文件后，因第一次访问时已加载CSS文件和JavaScript文件，已有相关环境参数，所以显示该HTML子文件时仅需加载HTML子文件即可。对应的，查找文字页面文件中该HTML子文件对应的内容进行显示。S7. The browser receives and displays the content of one page of the PDF file corresponding to the HTML sub-file. Specifically, after the browser receives the HTML sub-file, since the CSS file and JavaScript file have been loaded during the first visit, and there are already relevant environment parameters, it only needs to load the HTML sub-file when displaying the HTML sub-file. Correspondingly, the content corresponding to the HTML sub-file in the text page file is found and displayed.

重复执行步骤S6和步骤S7，实现PDF文件的逐页访问，直至PDF文件的所有页面访问结束。本实施例在连续加载PDF文件页面过程中，每次仅需加载一个HTML子文件，加载速度快，使用流量少，提高用户体验。Repeat steps S6 and S7 to implement page-by-page access of the PDF file until access to all pages of the PDF file is completed. In this embodiment, during the continuous loading of PDF file pages, only one HTML sub-file is loaded each time, which results in fast loading speed, low traffic usage, and improved user experience.

在一些实施例的PDF文件转换为分页HTML文件的方法中，浏览器设置有网页文字格式，浏览器在显示页面中的文字时，将文字页面文件中的文字格式转换为网页文字格式显示，使PDF文件的文字按照网页要求显示，更加整齐美观。In the method of converting a PDF file into a paged HTML file in some embodiments, the browser is set with a web page text format. When the browser displays the text in the page, it converts the text format in the text page file into a web page text format for display, so that The text of the PDF file is displayed according to the requirements of the web page, making it more neat and beautiful.

在一些实施例的PDF文件转换为分页HTML文件的方法中，因PDF文件本身内容较多，字体格式也较多，一些字体格式可能无法在浏览器上直接显示。若浏览器在显示页面中的文字时出现乱码，则说明部分字体无法进行正常显示，需要按照文字页面文件中的文字格式重新渲染，即将该部分乱码文件按照其在PDF文件中的原始格式进行直接渲染显示，解决乱码问题。In the method of converting a PDF file into a paged HTML file in some embodiments, because the PDF file itself has a lot of content and a lot of font formats, some font formats may not be directly displayed on the browser. If the browser displays garbled characters when displaying the text on the page, it means that some fonts cannot be displayed normally and need to be re-rendered according to the text format in the text page file. That is, the garbled files must be directly rendered according to their original format in the PDF file. Render display to solve the problem of garbled characters.

在一优选实施例中，参考图4，本实施例的PDF文件转换为分页HTML文件的方法中，步骤S6包括：In a preferred embodiment, referring to Figure 4, in the method of converting a PDF file into a paged HTML file in this embodiment, step S6 includes:

S61、服务器接收页面继续访问指令，判断当前访问用户是否具有继续阅读权限；若当前访问用户具有继续阅读权限，则查找与页面继续访问指令对应的HTML子文件，并下发至浏览器。具体的，为方便管理不同用户访问PDF文件的权限，本实施例要求用户使用用户账户登陆服务器，服务器中存储中每个用户账户的权限。服务器接收页面继续访问指令，判断当前访问用户是否具有继续阅读权限；若当前访问用户具有继续阅读权限，则查找与页面继续访问指令对应的HTML子文件，并下发至浏览器。若当前访问用户没有权限，则发送提示信息至浏览器，提示用户没有权限继续访问。本实施例通过PDF文件的逐页显示实现权限管理，方便对PDF部分内容进行管理。S61. The server receives the page continue access instruction and determines whether the current access user has the continue reading permission; if the current access user has the continue reading permission, it searches for the HTML subfile corresponding to the page continue access instruction and sends it to the browser. Specifically, in order to facilitate the management of different users' permissions to access PDF files, this embodiment requires users to log in to the server using user accounts, and the permissions of each user account are stored in the server. The server receives the page continue access instruction and determines whether the current access user has the continue reading permission; if the current access user has the continue reading permission, it searches for the HTML subfile corresponding to the page continue access instruction and sends it to the browser. If the current access user does not have permission, a prompt message is sent to the browser to prompt the user that he does not have permission to continue accessing. This embodiment implements rights management through page-by-page display of PDF files to facilitate management of partial contents of the PDF.

在一优选实施例中，本实施例的计算机设备包括存储器和处理器；存储器用于存储计算机程序；处理器用于执行存储器中存储的计算机程序以实现如上述的PDF文件转换为分页HTML文件的方法。本实施例计算机设备将PDF文件转换为分页HTML文件后，不需要浏览器安装PDF插件，且每次仅加载一页内容，加载速度快，使用流量少。In a preferred embodiment, the computer device of this embodiment includes a memory and a processor; the memory is used to store a computer program; the processor is used to execute the computer program stored in the memory to implement the above-mentioned method of converting a PDF file into a paged HTML file. . After the computer device of this embodiment converts the PDF file into a paged HTML file, the browser does not need to install a PDF plug-in, and only one page of content is loaded at a time, which results in fast loading speed and low traffic usage.

本说明书中各个实施例采用递进的方式描述，每个实施例重点说明的都是与其他实施例的不同之处，各个实施例之间相同相似部分互相参见即可。对于实施例公开的装置而言，由于其与实施例公开的方法相对应，所以描述的比较简单，相关之处参见方法部分说明即可。Each embodiment in this specification is described in a progressive manner. Each embodiment focuses on its differences from other embodiments. The same and similar parts between the various embodiments can be referred to each other. As for the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple. For relevant details, please refer to the description in the method section.

专业人员还可以进一步意识到，结合本文中所公开的实施例描述的各示例的单元及算法步骤，能够以电子硬件、计算机软件或者二者的结合来实现，为了清楚地说明硬件和软件的可互换性，在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本发明的范围。Those skilled in the art may further realize that the units and algorithm steps of each example described in connection with the embodiments disclosed herein can be implemented by electronic hardware, computer software, or a combination of both. In order to clearly illustrate the possible functions of hardware and software, Interchangeability, in the above description, the composition and steps of each example have been generally described according to functions. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered to be beyond the scope of the present invention.

结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块，或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。The steps of the methods or algorithms described in conjunction with the embodiments disclosed herein may be implemented directly in hardware, in software modules executed by a processor, or in a combination of both. Software modules may be located in random access memory (RAM), memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disks, removable disks, CD-ROMs, or anywhere in the field of technology. any other known form of storage media.

以上实施例只为说明本发明的技术构思及特点，其目的在于让熟悉此项技术的人士能够了解本发明的内容并据此实施，并不能限制本发明的保护范围。凡跟本发明权利要求范围所做的均等变化与修饰，均应属于本发明权利要求的涵盖范围。The above embodiments are only for illustrating the technical concepts and characteristics of the present invention. Their purpose is to enable those familiar with this technology to understand the content of the present invention and implement it accordingly, and cannot limit the scope of protection of the present invention. All equivalent changes and modifications to the scope of the claims of the present invention shall fall within the scope of the claims of the present invention.

Claims

1. A method for converting PDF files into paginated HTML files, which is characterized by including:

S1. Receive a PDF file, and convert the PDF file into a single HTML file and multiple font files, each of the font files containing a type of font in the PDF file;

S2. After parsing the HTML file, separate CSS files, JavaScript files and multiple HTML sub-files. Each page of the PDF file corresponds to one HTML sub-file; merge multiple font files into a single text page. file, wherein the name of each HTML sub-file contains corresponding page number information;

S3. Store the CSS file, the JavaScript file, a plurality of the HTML sub-files and the text page file;

S4. The server receives the PDF file access request, searches for the CSS file, the JavaScript file, the text page file and one of the HTML sub-files corresponding to the PDF file access request, and sends the searched file to A browser, wherein the PDF file access request includes page number information;

S5. The browser loads the CSS file, the JavaScript file, the text page file and the HTML sub-file, and the HTML sub-file displays the content of one page of the PDF file. When describing the CSS file and the JavaScript file, the basic parameters and environment of the web page are established. When subsequently displaying other page numbers of the PDF file, there is no need to load the CSS file and the JavaScript file, and the server no longer needs to load all the page numbers. The CSS file, the JavaScript file and the text page file are sent to the browser;

S6. The server receives the page continuation access instruction, searches for the HTML sub-file corresponding to the page continuation access instruction, and sends it to the browser; in step S6, between the search and the The HTML sub-file corresponding to the page continuation access instruction also includes: determining whether the current access user has permission to continue reading; if so, searching for the HTML sub-file corresponding to the page continuation access instruction;

S7. The browser receives and displays the content of one page of the PDF file corresponding to the HTML sub-file; when displaying the text in the page, the browser loads the content corresponding to the page number in the text page file;

When the browser displays the text in the page, it converts the text format in the text page file into the text format of the web page for display; if garbled characters appear when the browser displays the text in the page, it will display the text in the text page file. Text formatting in the file is re-rendered.

2. The method of converting a PDF file into a paged HTML file according to claim 1, characterized in that the step S3 includes:

The CSS file, the JavaScript file, a plurality of the HTML sub-files and the text page file are named according to the file number corresponding to the PDF file, and the name of each HTML sub-file includes a corresponding page number. Information, store the named CSS file, the JavaScript file, a plurality of the HTML sub-files and the text page file in the same folder, and the folder is named with the file number.

3. The method of converting a PDF file into a paged HTML file according to claim 1, wherein the HTML sub-file is the HTML sub-file corresponding to the content of the first page of the PDF file.

4. A computer device, characterized by comprising a memory and a processor;

The memory is used to store computer programs;

The processor is configured to execute the computer program stored in the memory to implement the method of converting a PDF file into a paged HTML file as described in any one of claims 1 to 3.