[go: up one dir, main page]

CN104361111B - A kind of archives are compiled and grind method automatically - Google Patents

A kind of archives are compiled and grind method automatically Download PDF

Info

Publication number
CN104361111B
CN104361111B CN201410714594.2A CN201410714594A CN104361111B CN 104361111 B CN104361111 B CN 104361111B CN 201410714594 A CN201410714594 A CN 201410714594A CN 104361111 B CN104361111 B CN 104361111B
Authority
CN
China
Prior art keywords
mrow
expert
archives
volume
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201410714594.2A
Other languages
Chinese (zh)
Other versions
CN104361111A (en
Inventor
蒋静
王卓平
门霞
赵毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao University
Original Assignee
Qingdao University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao University filed Critical Qingdao University
Priority to CN201410714594.2A priority Critical patent/CN104361111B/en
Publication of CN104361111A publication Critical patent/CN104361111A/en
Application granted granted Critical
Publication of CN104361111B publication Critical patent/CN104361111B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3322Query formulation using system suggestions
    • G06F16/3323Query formulation using system suggestions using document space presentation or visualization, e.g. category, hierarchy or range presentation and selection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the classification of documents and retrieval technique field, being related to a kind of archives based on B/S framework MIS of Department Files, volume grinds method automatically;It is first according to the unified form difference typing archive information that archives list, list in volume and expert's registration card interface are provided, archive information is classified and collected automatically using automatic hierarchical classification algorithm with management module by archives typing again, and is stored in respectively in corresponding database;Then the compiling and researching condition and the information of deposit that Compiling of Files module is inputted according to user are retrieved to associated databases, inquired about and collected generation Compiling of Files result, finally Compiling of Files result is included on screen, or preserve Compiling of Files result to form paper document after export printing in the form of Word document or Excel forms, realize that the automatic volume of archives is ground;Its design principle science is reliable, and volume grinds that labor intensity is small, and operating efficiency is high, and information careless omission is few, it is ensured that compile the quality and value ground, and volume grinds efficiency high, and volume grinds environment-friendly.

Description

一种档案自动编研方法A Method for Automatic Compilation and Research of Archives

技术领域:Technical field:

本发明属于档案分类与检索技术领域,涉及一种基于B/S架构档案管理信息系统的档案自动编研方法,为档案汇编和档案文摘汇编提供由计算机软件完成的档案自动编研技术。The invention belongs to the technical field of file classification and retrieval, and relates to a method for automatically compiling and researching files based on a B/S architecture file management information system, which provides automatic file compiling and research technology completed by computer software for file compilation and archive abstract compilation.

背景技术:Background technique:

基于B/S架构的档案管理信息系统是传统档案馆功能在信息化社会的延伸,既具有传统档案馆的基本属性和功能,又能适应信息时代的发展需要,在实现常规档案数字化管理的同时,通过互联网和建立数字档案存储数据库实现对各部门及各类档案信息的收集、存储、管理和利用,为档案资源的利用提供信息化服务。面向档案利用的档案编研工作是档案馆/室根据档案利用的实际需求,以档案馆/室库藏档案为基础,以汇编档案参考资料为成果形式的一项专门工作。档案编研工作的实质是对档案文件的内容进行研究和加工整理,并编研成册,使之一目了然,以提高执政部门和单位综合管理水平和工作效率,增强档案资源对社会的服务价值。目前,档案编研工作主要是以人工编研为主,其速度慢,效率低,编研质量差。The archives management information system based on B/S architecture is an extension of the functions of traditional archives in the information society. It not only has the basic attributes and functions of traditional archives, but also can adapt to the development needs of the information age. , through the Internet and the establishment of a digital archive storage database to realize the collection, storage, management and utilization of various departments and various archive information, and to provide information services for the utilization of archive resources. Archives compilation and research for archives utilization is a special work in the form of compilation of archives reference materials based on archives stored in archives/rooms according to the actual needs of archives utilization. The essence of archives compilation and research work is to study and process the content of archives and documents, and compile them into volumes to make them clear at a glance, so as to improve the comprehensive management level and work efficiency of administrative departments and units, and enhance the service value of archives resources to society. At present, archives compilation and research work is mainly based on manual compilation and research, which is slow, inefficient and of poor quality.

传统的人工档案编研方法按照对档案的加工层次分为2种,第一种档案编研的加工方法是在原始档案的基础上摘录、缩编和剪辑,形成概要性材料;其加工档案编研的成果形式有:发文汇集、专题汇编、专题档案文摘汇编,其中包括某一领域的专家学者及学术论文文摘汇编、科技成果文摘汇编等等;第二种档案编研的加工方法是需要在对原始档案资料的有关内容进行分析、研究和归纳的基础上,编写出新的材料;其加工档案编研成果形式有:年鉴、组织沿革、史志和综合性技术经济调研报告等。在第二种档案编研的加工成果形式中由于包含了人们对事物的新认识、新观点、新的结论和新的建议等新内容,从信息量的角度来看这些都是新增加的信息,所以这种档案编研加工一般是由相关领域的专家或学者完成。而第一种加工档案编研成果包含的信息是档案中已有的信息,不会增加信息量,不产生新的内容,力求做到“全、精、准”,不能有疏漏,编研的内容要全面、细致,宁多勿漏,并且随着时间的延续,需要编研的原始档案材料积累将会达到海量数据量,对于海量数据由人工进行编研稍有大意就会造成信息遗漏或出错,不能保证档案编研成果的质量和价值,其劳动强度大,工作效率低,编研准确率差,制约了档案资源在海量数据规模和更高技术水平上的充分开发和利用,人力成本很高。According to the processing level of the archives, the traditional manual archives compilation and research methods are divided into two types. The first archives compilation and research processing method is to extract, shrink and edit the original archives to form summary materials; the processing archives compilation and research The forms of results include: collection of publications, collection of special topics, collection of abstracts of special archives, including collection of abstracts of experts and scholars in a certain field, collection of abstracts of academic papers, collection of abstracts of scientific and technological achievements, etc.; On the basis of analyzing, researching and summarizing the relevant content of the original archives, new materials are compiled; the forms of the processed archives compilation and research results include: yearbooks, organizational history, historical records, and comprehensive technical and economic research reports. In the second form of processing results of archives compilation and research, since people’s new understanding of things, new viewpoints, new conclusions and new suggestions are included, these are newly added information from the perspective of information volume. , so this kind of archives compilation, research and processing is generally completed by experts or scholars in related fields. However, the information contained in the compilation and research results of the first type of processing archives is the existing information in the archives. It will not increase the amount of information, and will not produce new content. The content should be comprehensive and detailed, and it is better not to miss more, and with the continuation of time, the accumulation of original archival materials that need to be edited and researched will reach a massive amount of data, and the slightest carelessness in manual editing and research of massive data will cause information omission or If mistakes are made, the quality and value of archives compilation and research results cannot be guaranteed. Its labor intensity is high, work efficiency is low, and the accuracy of compilation and research is poor, which restricts the full development and utilization of archives resources on the scale of massive data and higher technical levels. very high.

发明内容:Invention content:

本发明的目的在于克服现有技术存在的缺点,寻求设计提供一种基于档案管理信息系统的档案自动编研方法,采用计算机自动分类和检索技术自动生成档案编研成果形式,提高档案编研效率和准确率,减少人工编研时的信息疏漏。The purpose of the present invention is to overcome the shortcomings of the prior art, seek to design and provide an archives automatic compilation and research method based on the archives management information system, and use computer automatic classification and retrieval technology to automatically generate the form of archives compilation and research results to improve the efficiency of archives compilation and research and accuracy, reducing information omissions during manual editing.

为了实现上述目的,本发明在基于B/S架构的档案管理信息系统中由档案录入与管理模块和档案编研模块联合实现档案自动编研,其具体步骤为:In order to achieve the above object, the present invention realizes the automatic compilation and research of archives jointly by the archives entry and management module and the archives compilation and research module in the archives management information system based on B/S architecture, and its specific steps are:

(1)先进行档案信息录入,按照系统显示的案卷目录、卷内目录和专家登记卡界面给出的统一格式分别录入档案标题、档案所属分类、档号、年度和专家情况各类基本信息;(1) Enter the file information first, and enter the file title, file category, file number, year and various basic information of the expert according to the unified format given by the file directory, file directory and expert registration card interface displayed by the system;

(2)再由档案录入与管理模块利用本发明提出的自动层次分类算法自动对步骤(1)录入的档案信息进行分类和汇总,并分别存入相对应的档案目录、卷内目录与专家基本信息登记目录数据库和专家数据库中;(2) The file information entered in step (1) is automatically classified and summarized by the file entry and management module using the automatic hierarchical classification algorithm proposed by the present invention, and stored in the corresponding file directory, volume directory and expert basics respectively. In the information registration directory database and expert database;

(3)再由档案编研模块根据用户输入的编研条件和存入的信息对相应档案目录、卷内目录与专家基本信息登记目录数据库和专家数据库进行检索、查询并汇总生成档案编研结果;(3) The archives compilation and research module searches, inquires, and summarizes the archives compilation and research results for the corresponding archives catalog, volume catalog, expert basic information registration catalog database and expert database according to the compilation and research conditions input by the user and the stored information ;

(4)将档案编研结果显示在屏幕上,或将档案编研结果以Word文档或Excel报表的形式导出后打印输出形成纸质文档保存,实现档案的自动编研。(4) Display the archive editing and research results on the screen, or export the archive editing and research results in the form of Word documents or Excel reports, print out and save them as paper documents, so as to realize automatic archive editing and research.

本发明提出的自动层次分类算法是对现有常规的朴素贝叶斯算法的改进,朴素贝叶斯算法是指分类时考虑文本的所有特征对文本进行分类,分类时将预测样本根据预测结果划分到特定文档类别概率最高的类别库中。The automatic hierarchical classification algorithm proposed by the present invention is an improvement to the existing conventional Naive Bayesian algorithm. The Naive Bayesian algorithm refers to the classification of texts considering all the features of the text during classification, and the prediction samples are divided according to the prediction results during classification. to the category library with the highest probability for a particular document category.

本发明涉及的朴素贝叶斯算法的具体分类模型如下:给定一个未知类别的档案文本X,设有m个类别,记为C1,C2,……,Cm,根据朴素贝叶斯分类定律,在条件X下具有最高后验概率的类别P(Ci|X)的计算公式如下:The specific classification model of the Naive Bayesian algorithm involved in the present invention is as follows: Given an archive text X of unknown category, there are m categories, which are denoted as C1, C2, ..., Cm, according to the Naive Bayesian classification law, The category P(Ci|X) with the highest posterior probability under condition X is calculated as follows:

在P(Ci|X)计算公式中,P(X)是常数,因此只需要将分子P(X|Ci)P(Ci)最大化即可;P(Ci)是训练集中的类别分布概率,计算公式为:式中分子为类别|Ci|包含的文本数加1,分母为m个类别与|D|为训练集中所包含的文本总数之和;为了简化P(X|Ci)的计算过程,假定文本的多个属性是相互无关的,因此,计算P(X|Ci)就是推算特征属性在类别Ci上出现的概率,使用拉普拉斯估计的2种计算模型来推算P(X|Ci)的值:In the P(Ci|X) calculation formula, P(X) is a constant, so it is only necessary to maximize the molecule P(X|Ci)P(Ci); P(Ci) is the category distribution probability in the training set, The calculation formula is: In the formula, the numerator is the number of texts contained in the category |Ci| plus 1, and the denominator is the sum of m categories and |D| is the total number of texts contained in the training set; in order to simplify the calculation process of P(X|Ci), it is assumed that the text Multiple attributes are unrelated to each other. Therefore, calculating P(X|Ci) is to calculate the probability of feature attributes appearing on category Ci, and use two calculation models estimated by Laplace to calculate the value of P(X|Ci) :

(1)多变量模型,统计特征属性在文本中是否出现过,若出现记为1,否则记为0。计算公式为:(1) Multivariate model, whether the statistical feature attribute has appeared in the text, if it appears, it will be recorded as 1, otherwise it will be recorded as 0. The calculation formula is:

其中,|V|代表特征值总的数量,Bxt是wt在文本X中出现的标记,若wt出现则Bxt记为1,否则记为0,wt代表第t个特征,即向量的第t个分量,因此,式中的P(wt|Ci)计算公式如下:Among them, |V| represents the total number of feature values, B xt is the mark that w t appears in the text X, if w t appears, B xt is recorded as 1, otherwise it is recorded as 0, and w t represents the tth feature, that is The tth component of the vector, therefore, the calculation formula of P(w t |C i ) in the formula is as follows:

(2)多项式模型(Multinomial Model)则统计特征属性在文本中的出现次数,计算公式为:(2) The multinomial model (Multinomial Model) counts the occurrence times of feature attributes in the text, and the calculation formula is:

其中,Nxt代表了特征t在文本X中出现的次数;P(wt|Ci)的计算公式如下:Among them, N xt represents the number of times feature t appears in text X; the calculation formula of P(w t |C i ) is as follows:

在P(wt|Ci)的计算公式中,Njt为特征t在文本dj中曾出现的次数,|D|为训练文本总数,|V|为特征总数,Njs是特征s在文本dj中的出现次数;该分类方法的实质是对文本对象中的所有特征值进行统计并映射到已存在的各个类别中的概率。In the calculation formula of P(w t |C i ), N jt is the number of times feature t has appeared in text d j , |D| is the total number of training texts, |V| is the total number of features, and N js is the number of features s in The number of occurrences in the text d j ; the essence of this classification method is to count all the feature values in the text object and map to the probability of each existing category.

本发明在档案录入管理模块中对朴素贝叶斯算法进行了改进,实现基于案卷目录标题及关键词粗分类的自动层次分类算法,直接从案卷目录以及卷内目录的题目提取关键词集,构建层次化分类模型,在适当的降维后以低的特征维度达到分类效果,取代传统文本分类算法的中文分词,有效提高档案文献的分类精度和运行效率;所述的基于案卷目录标题及关键词粗分类的自动层次分类算法的实现流程如下:The present invention improves the naive Bayesian algorithm in the file entry management module, realizes the automatic hierarchical classification algorithm based on the title of the file catalog and the rough classification of keywords, directly extracts the keyword set from the file catalog and the title of the catalog in the file, and constructs Hierarchical classification model achieves classification effect with low feature dimension after appropriate dimension reduction, replaces Chinese word segmentation of traditional text classification algorithm, effectively improves classification accuracy and operation efficiency of archives; The implementation process of the automatic hierarchical classification algorithm for coarse classification is as follows:

(1)先在本地或在线录入档案信息,按照系统显示的案卷目录、卷内目录和专家登记卡界面给出的统一格式分别录入档案标题、所属分类、档号、年度和专家的各类基本信息;(1) First enter the file information locally or online, and enter the file title, category, file number, year, and various basic information of the expert according to the unified format given by the file catalog displayed on the system, the catalog in the file, and the expert registration card interface. information;

(2)系统自动提取档案标题及档案文本中的关键词的文本数据特征参数集并保存在相应的数据库中;(2) The system automatically extracts the text data feature parameter set of the keyword in the file title and the file text and stores it in the corresponding database;

(3)对提取的文本数据特征参数集超过阈值时进行降维,过多的特征往往会导致维数灾难,使分类的效率降低;(3) Dimensionality reduction is performed when the extracted text data feature parameter set exceeds the threshold, too many features will often lead to the disaster of dimensionality and reduce the efficiency of classification;

(4)根据提取的文本数据特征参数或关键词执行朴素贝叶斯分类算法的粗分类;(4) perform rough classification of naive Bayesian classification algorithm according to extracted text data feature parameters or keywords;

(5)在步骤(4)粗分类结果上再分别针对每一子类进行特征抽取;(5) On the rough classification result of step (4), carry out feature extraction for each subclass respectively;

(6)再针对各个子类的文本数据特征参数执行朴素贝叶斯分类算法自动完成细分类;(6) Carrying out the naive Bayesian classification algorithm for the text data feature parameters of each subclass to automatically complete the subdivision;

(7)输出分类结果并保存到相对应的数据库中。(7) Output the classification result and save it in the corresponding database.

本发明涉及的档案编研模块处理的数据信息是针对已建立的档案信息录入与管理模块的案卷目录、卷内目录和专家登记卡目录进行的基本编研,在档案信息录入与管理模块中创建包括案卷目录数据库、卷内目录数据库、档案分类数据库、专家基本信息登记数据库、专家论文明细和专家项目明细等6个数据库;档案编研模块由档案分类编研子模块、文件字号索引编研子模块和专家信息编研3个子模块组成,在档案编研模块中创建与上述6个数据库相关联的档案编研基础数据库;档案分类编研子模块根据用户输入的编研条件,自动实现档案分类编研、档案标题编研和归档时间编研并以列表形式显示编研结果;文件字号索引编研子模块根据用户的编研需求输入包括公文字号、年度、档号和保管期限的组合编研条件,点击查询后根据条件过滤数据,以便自动生成并显示文件字号索引编研列表;专家信息编研子模块对档案管理中的专家登记卡信息进行编研统计,根据输入的专家姓名、研究方向和成果的编研条件进行模糊查询,实现专家分类编研、专家研究方向编研、专家论文信息编研和专家项目信息编研,并将编研结果进行汇总后显示在屏幕上;其中专家分类编研是指编研自动生成某一研究领域的所有专家信息编研结果列表,其编研结果可导出到Excel或Word文档中保存或打印输出。The data information processed by the file compilation and research module involved in the present invention is based on the basic compilation and research of the case file directory, file directory and expert registration card directory of the established file information entry and management module, and is created in the file information entry and management module. It includes 6 databases including case file directory database, in-volume directory database, file classification database, expert basic information registration database, expert paper details and expert project details; the archives compilation and research module consists of archives classification compilation and research sub-module, file size index compilation and research sub-module Module and expert information compilation and research are composed of 3 sub-modules. In the archives compilation and research module, the basic database of archives compilation and research associated with the above-mentioned 6 databases is created; the archives classification compilation and research sub-module automatically realizes the archives classification according to the compilation and research conditions input by the user Compiling and researching, file title editing and archiving time editing and researching and displaying the editing and researching results in the form of a list; the file font size index editing and researching sub-module inputs the combined editing and researching including official font size, year, file number and storage period according to the user's editing and researching needs Conditions, click to filter the data according to the conditions, so as to automatically generate and display the file size index compilation and research list; the expert information compilation and research sub-module compiles and researches the expert registration card information in the file management, according to the input expert name and research direction Carry out fuzzy query with the compilation and research conditions of the results, realize the compilation and research of expert classification, the compilation and research of expert research direction, the compilation and research of expert paper information and the compilation and research of expert project information, and summarize the compilation and research results and display them on the screen; among them, expert classification Compilation and research means that the compilation and research automatically generates a list of compilation and research results of all expert information in a certain research field, and the compilation and research results can be exported to Excel or Word documents for saving or printing.

本发明涉及的档案分类编研子模块的档案编研包括以下步骤:The archives compilation and research of the archives classification compilation and research submodule involved in the present invention comprises the following steps:

(1)创建分类编研视图,视图以卷内目录或案卷目录为主表,关联分类信息表获取分类信息名称,卷内目录与案卷目录信息存放在不同的数据表中,在进行分类编研时需要汇总两部分信息,进行统一查询并检索;(1) Create a taxonomy compilation and research view. The view takes the in-volume catalog or case file catalog as the main table, and associates the classification information table to obtain the classification information name. The in-volume catalog and case file catalog information are stored in different data tables. It is necessary to summarize two parts of information for unified query and retrieval;

(2)数据访问层代码,数据访问层是从步骤(1)给出的视图中检索需要编研的档案信息,执行函数以查询条件为参数,检索符合条件的档案信息,分类编研提取档案分类、档案标题、归档时间、份数、页数和档号信息,并对编研结果按照分类名称、分类标题和归档时间进行排序;(2) Data access layer code. The data access layer retrieves the file information that needs to be edited and researched from the view given in step (1). The execution function takes the query condition as a parameter to retrieve the file information that meets the conditions, and extracts the file by category. Classification, file title, filing time, number of copies, page number and file number information, and sort the editing and research results according to classification name, classification title and filing time;

(3)分类编研应用层实现,首先设置分类编研条件,按档案分类名称模糊检索、档案标题模糊检索和归档时间检索分类编研;(3) Realization of the application layer of classification compilation and research. Firstly, the classification compilation and research conditions are set, and classification compilation and research are performed according to the fuzzy retrieval of file classification names, fuzzy retrieval of file titles and archive time retrieval;

(4)将编研结果导出到EXCEL或Word文档中,方便用户保存查看并打印装订成册。(4) Export the compilation and research results to EXCEL or Word documents, which is convenient for users to save, view, print and bind them into a book.

本发明涉及的文件字号索引编研子模块的档案编研包括以下步骤:The file compilation and research of the submodule of the file size index compilation and research submodule involved in the present invention comprises the following steps:

(1)创建文件字号索引视图,文件字号索引视图以卷内目录为主表关联分类信息表,提取公文字号、年度、文件序号、档号、页号、页数和保管期限信息;(1) Create a document font size index view, which takes the catalog in the volume as the main table and associates the classification information table to extract the official font size, year, document serial number, file number, page number, page number and storage period information;

(2)数据访问层代码,数据访问层是从文件字号索引视图中提取信息,并按照公文字号、文件序号和年度排序,函数以查询条件为参数,由应用层动态构建;(2) Data access layer code, the data access layer is to extract information from the document font size index view, and sort according to the official font size, file serial number and year, the function takes query conditions as parameters, and is dynamically constructed by the application layer;

(3)应用层主要代码,系统设定编研条件,包括公文字号、年度、档号和保管期限,根据编研需要进行组合输入,点击查询后根据条件过滤数据;(3) The main code of the application layer, the system sets the compilation and research conditions, including the official font size, year, file number and storage period, and enters the combination according to the compilation and research needs, and filters the data according to the conditions after clicking the query;

(4)将编研结果导出到EXCEL或Word文档中,方便用户保存查看并打印装订成册。(4) Export the compilation and research results to EXCEL or Word documents, which is convenient for users to save, view, print and bind into a book.

本发明涉及的专家信息编研子模块的档案编研包括以下步骤:The file compilation and research of the expert information compilation and research submodule involved in the present invention comprises the following steps:

(1)创建专家信息编研视图,视图以专家基本信息登记表为主表,关联专家论文明细表提取论文信息,关联专家项目明细表提取项目及获奖信息,专家基本信息登记与专家成果信息存放在不同的数据表中,在进行信息编研时需要汇总各部分信息,进行统一查询;(1) Create an expert information compilation and research view. The view takes the expert basic information registration form as the main table, associate expert paper list to extract paper information, associate expert project list to extract project and award-winning information, expert basic information registration and expert achievement information storage In different data tables, it is necessary to summarize the information of each part for unified query when conducting information compilation and research;

(2)数据访问层代码,数据访问层是从步骤(1)中的视图中获取需要编研的专家信息,函数以查询条件为参数,查询符合条件的专家档案信息,编研提取专家姓名、专家类别、研究方向、论文信息和项目信息;(2) Data access layer code. The data access layer obtains the expert information that needs to be edited and researched from the view in step (1). The function uses the query condition as a parameter to query the qualified expert file information, and compiles and researches to extract the expert name, Expert category, research direction, paper information and project information;

(3)专家信息编研应用层实现,设置编研条件,进行专家名称模糊检索、专家研究方向模糊检索,论文题目与论文概述模糊检索,论文发表时间检索,项目名称模糊检索,项目概述模糊检索和项目起止时间检索和项目获奖情况检索;专家信息编研根据输入的编研条件,实现专家分类编研、专家研究方向编研、专家论文信息编研和专家项目信息编研;(3) Implementation of expert information compilation and research application layer, setting compilation and research conditions, fuzzy retrieval of expert names, fuzzy retrieval of expert research directions, fuzzy retrieval of paper titles and paper overviews, paper publication time retrieval, fuzzy retrieval of project names, and fuzzy retrieval of project overviews and project start and end time retrieval and project award status retrieval; expert information compilation and research can realize expert classification compilation and research, expert research direction compilation and research, expert paper information compilation and expert project information compilation and research according to the input compilation and research conditions;

(4)将编研结果导出到Excel或Word文档中,方便用户保存查看并打印装订成册。(4) Export the compilation and research results to an Excel or Word document, which is convenient for users to save, view, print and bind into a book.

本发明涉及的档案管理信息系统的执行流程为:The execution flow of the file management information system involved in the present invention is:

(1)在客户端打开一个浏览器,在地址栏中输入系统的网站地址即向Web服务器发出服务请求,当桌面上显示系统的登录页面时,在登录页面填写用户名、密码及验证码之后传送给Web服务器,Web服务器对用户身份进行验证后用HTTP协议把档案管理信息系统的主页传送给客户端,客户端浏览器接收传来的主页文件,并把它显示在屏幕上;(1) Open a browser on the client, enter the website address of the system in the address bar to send a service request to the Web server, when the login page of the system is displayed on the desktop, after filling in the user name, password and verification code on the login page Send it to the Web server, and the Web server uses the HTTP protocol to send the home page of the file management information system to the client after verifying the user's identity, and the client browser receives the transmitted home page file and displays it on the screen;

(2)档案基本信息的录入,按照系统显示在主页的案卷目录、卷内目录和专家登记卡给出的统一格式分别录入和添加档案所属分类、档号、档案标题、年度和专家各类基本信息;系统在Web服务器的业务逻辑层执行相应的扩展应用程序与数据库服务器进行连接,通过SQL方式将用户录入或添加的上述各类基本信息在存储到相对应的与Web服务器相连的数据库之前,系统将目录及档案标题进行自动分类归档,再对原文进行挂接;原文可以是电子扫描件或电子版原件;(2) The entry of the basic information of the archives is to enter and add the category, file number, file title, year, and various basic information of the expert according to the uniform format given by the system on the homepage of the file catalog, the catalog in the file, and the expert registration card. Information; the system executes the corresponding extended application program in the business logic layer of the Web server to connect with the database server, and before storing the above-mentioned various basic information entered or added by the user in the corresponding database connected to the Web server through SQL, The system automatically classifies and archives the catalog and file titles, and then mounts the original text; the original text can be an electronic scan or an electronic original;

(3)需要对某类档案信息进行档案编研时,系统根据用户选择的档案编研的条目,进入到相对应的档案编研界面,在该界面中输入编研条件;例如:进行某个领域的专家信息编研时,则输入该研究领域的名称或研究方向还包括论文、项目情况等编研参数点击查询按钮,对相应数据库进行信息检索和查询;(3) When a certain type of archive information needs to be compiled and researched, the system will enter the corresponding file compilation and research interface according to the entry of the file compilation and research selected by the user, and input the compilation and research conditions in this interface; for example: carry out a certain When compiling and researching expert information in the field, enter the name or research direction of the research field, including the compilation and research parameters such as papers and project status, and click the query button to perform information retrieval and query on the corresponding database;

(4)根据(3)已输入的编研参数,在与Web服务器执行链接后通过SQL语句向相对应的数据库服务器提出数据处理请求,即对档案编研基础数据库和相关联的其他数据库信息进行检索和查询操作,并将检索到符合编研条件的数据项进行统计、分析和汇总,生成档案编研成果;(4) According to the editing and research parameters that have been input in (3), after executing the link with the Web server, a data processing request is made to the corresponding database server through the SQL statement, that is, the basic database of archives editing and research and other associated database information are processed. Retrieval and query operations, counting, analyzing and summarizing the retrieved data items that meet the compilation and research conditions, and generating archives compilation and research results;

(5)数据库服务器把生成的档案编研的结果提交给Web服务器,再由Web服务器传送到客户端并显示在屏幕上;(5) The database server submits the generated file editing results to the Web server, and then the Web server transmits them to the client and displays them on the screen;

(6)将编研成果导出到Word文档或Excel报表中保存或打印输出。(6) Export the editing and research results to Word documents or Excel reports for saving or printing out.

本发明与现有技术相比,其设计原理科学可靠,编研劳动强度小,工作效率高,信息疏漏少,保证编研的质量和价值,编研效率高,编研环境友好。Compared with the prior art, the present invention has scientific and reliable design principles, low editing and researching labor intensity, high work efficiency, less information omissions, guaranteed editing and researching quality and value, high editing and researching efficiency, and friendly editing and researching environment.

附图说明:Description of drawings:

图1为本发明装置的硬件组成结构原理示意框图。Fig. 1 is a schematic block diagram of the principle of hardware composition and structure of the device of the present invention.

图2是本发明涉及的档案编研模块与档案管理模块的逻辑功能结构原理示意框图。Fig. 2 is a schematic block diagram of the logical functional structure and principles of the archives compilation and research module and the archives management module involved in the present invention.

图3是本发明的档案管理信息系统自动编研执行流程图。Fig. 3 is a flow chart of the automatic compilation and research execution of the archives management information system of the present invention.

图4是本发明的粗分类的层次分类算法执行流程图。Fig. 4 is a flow chart of the implementation of the hierarchical classification algorithm of rough classification in the present invention.

具体实施方式:detailed description:

下面通过实施例并结合附图做进一步描述。Further description will be made below through embodiments and in conjunction with accompanying drawings.

实施例1:Example 1:

本实施例对本发明提出的分类算法进行测试与评价,先在收集到的1000个档案文本中,从每类随机抽选40个文本对本发明涉及的方法进行分类训练,其余的960个档案文本就作为待分类文本集对本发明涉及的方法进行分类结果的测试评价;其中,文书档案类的是222个,科技档案类216个,会计档案类162个,人事档案类95个,声像档案类43个,综合照片类86个,实物档案类35个,归档文件类40个,期刊档案类的是61个,分别用查准率、查全率和F1(查全率和查准率的调和平均数)测试值三个指标对分类结果进行评价,测试评价结果如表1所示;In this embodiment, the classification algorithm proposed by the present invention is tested and evaluated. First, among the collected 1000 archive texts, 40 texts are randomly selected from each category to classify and train the method involved in the present invention, and the remaining 960 archive texts will be The test and evaluation of the classification results of the method involved in the present invention as the text set to be classified; wherein, there are 222 document files, 216 scientific and technological files, 162 accounting files, 95 personnel files, and 43 audio-visual files. There are 86 comprehensive photos, 35 physical archives, 40 archived documents, and 61 periodical archives. The precision, recall and F1 (harmonic average of recall and precision) are used respectively. Number) three indexes of test value evaluate classification result, and test evaluation result is as shown in table 1;

表1分类结果测试评价表Table 1 Classification result test evaluation form

类别category 查全率recall 查准率Precision F1测试值F1 test value 文书档案类Documents and archives 95.08%95.08% 91.80%91.80% 93.44%93.44% 科技档案类Science and Technology Archives 85.34%85.34% 87.93%87.93% 86.64%86.64% 会计档案类Accounting files 90.32%90.32% 93.55%93.55% 91.94%91.94% 人事档案类Personnel files 92.00%92.00% 94.67%94.67% 93.33%93.33% 声像档案类Audio and video files 93.02%93.02% 95.35%95.35% 94.19%94.19% 综合照片类Comprehensive photo category 97.67%97.67% 94.19%94.19% 95.93%95.93% 实物档案类Physical archives 91.43%91.43% 94.29%94.29% 92.86%92.86% 归档文件类Archive class 87.50%87.50% 85.00%85.00% 86.25%86.25% 期刊档案类Periodical Archives 90.16%90.16% 83.61%83.61% 86.89%86.89%

上表说明,本实施例的分类结果的查全率、查准率以及F1测试值均能达到较好的效果,在粗分类过程中根据文档标题和关键词产生的特征维数均在50以下,提高了系统运行效率。The above table shows that the recall rate, precision rate and F1 test value of the classification results of this embodiment can all achieve good results, and the feature dimensions generated according to the document title and keywords in the rough classification process are all below 50 , improving the operating efficiency of the system.

本实施例的运行环境要求:配置双核微处理器或更高、内存2G以上的联网PC及兼容机运行;服务器操作系统为Windows XP及以上版本;系统必备软件为Framework 3.5、SQLServer2005;开发软件为Microsoft Visual Studio 2008;采用B\S三层架构,通过asp.net分别实现表示层、业务逻辑层和数据层代码。The operation environment requirement of this embodiment: configuration dual-core microprocessor or higher, the networked PC and compatible machine operation of more than 2G of internal memory; Server operating system is Windows XP and above version; System requisite software is Framework 3.5, SQLServer2005; Development software It is Microsoft Visual Studio 2008; B\S three-tier architecture is adopted, and codes of presentation layer, business logic layer and data layer are respectively implemented through asp.net.

本实施例需要安装并设置Microsoft SQL Server数据库服务器,为服务器添加用户名和设置密码之后导入系统数据库;然后发布网站(即B/S架构的档案管理信息系统);网站发布成功后,打开任意一台已联网PC机上的浏览器,在地址栏中输入网站地址进入到登录页面,输入账号、密码和验证码之后点击登录进入到系统管理主界面;在系统管理员左侧的树形菜单栏中点击【档案编研】,屏幕上显示需要编研的多个条目,包括分类编研、文件字号索引编研和专家信息编研等;选择并点击【分类编研】,进入到分类编研界面,在该界面中输入各编研条件,包括档案分类名称、档案标题和归档时间(大于某一时间、小于某一时间或某段时间内)等,系统会根据输入的编研条件自动产生分类编研的结果并以列表的形式显示在屏幕上,并可导出到Word文档或Excel中保存或打印;选择并点击【文件字号索引编研】,则进入到文件字号索引编研界面,在该界面中根据编研需要进行组合输入编研的内容如公文字号、年度、档号和保管期限等,点击查询后系统会根据输入的编研条件检索、过滤数据自动生成并显示编研结果,或将编研结果导出到Word文档或Excel中保存或打印;选择并点击【专家信息编研】,进入到专家信息编研界面,在该界面中输入编研条件,系统进行专家姓名模糊检索、研究方向模糊检索、论文题目、论文概述模糊检索、论文发表时间检索、项目名称模糊检索、项目概述模糊检索、项目起止时间检索和项目获奖情况(是否获奖、获奖名称)检索;系统根据输入的专家姓名,研究方向和成果的编研条件进行模糊查询,实现专家分类编研、专家研究方向编研、专家论文信息编研和专家项目信息编研;并将编研结果进行汇总后显示在屏幕上,或将结果导出到Excel或Word文档中保存或打印输出。This embodiment needs to install and set the Microsoft SQL Server database server, import system database after adding user name and setting password for the server; Then publish website (i.e. the file management information system of B/S structure); After the website publishes successfully, open any one In the browser on the networked PC, enter the website address in the address bar to enter the login page, enter the account number, password and verification code, and then click Login to enter the main interface of the system management; in the tree menu bar on the left of the system administrator, click [File Compilation], the screen displays multiple items that need to be compiled, including category compilation, file size index compilation and expert information compilation, etc.; select and click [Category Compilation] to enter the category compilation interface, Input various compilation and research conditions in this interface, including file classification name, file title and filing time (greater than a certain time, less than a certain time or within a certain period of time), etc., the system will automatically generate classification and compilation according to the input compilation and research conditions. The results of the research are displayed on the screen in the form of a list, and can be exported to a Word document or Excel to save or print; select and click [File Size Index Compilation Research] to enter the file size index compilation interface, in this interface According to the needs of editing and researching, enter the content of editing and researching, such as official font size, year, file number and storage period, etc. After clicking query, the system will search and filter data according to the input editing and researching conditions to automatically generate and display the editing and researching results, or The compilation and research results are exported to Word document or Excel to save or print; select and click [Expert Information Compilation and Research] to enter the expert information compilation and research interface, enter the compilation and research conditions in this interface, and the system will perform fuzzy retrieval of expert names and research directions Fuzzy search, paper title, paper overview fuzzy search, paper publication time search, project name fuzzy search, project overview fuzzy search, project start and end time search and project award (whether award, award name) search; the system according to the input expert name, Perform fuzzy query on research directions and research results compilation conditions, realize expert classification compilation, expert research direction compilation and research, expert paper information compilation and expert project information compilation and research; and compile and research results are summarized and displayed on the screen, or Export the results to Excel or Word documents for saving or printing out.

本实施例的档案分类管理根据档案编研的实际需求进行分类编研的设置,实际工作需要编研一个“行政审批”的档案分类,系统管理员只需要在档案分类表中增加一个“行政审批”的分类数据,即可在档案的卷内目录、案卷目录中维护相应分类的档案信息,并对该分类执行编研操作;系统通过对各类档案信息的录入,将各类档案信息进行汇总,根据分类或档案标题等相关信息快速检索出该档案,通过查看档案的库存位置,快速从相应的物理存放位置获取该档案,缩短档案寻找的时间,提高工作效率。The file classification management in this embodiment is set according to the actual needs of file compilation and research. The actual work needs to compile and research a file classification of "administrative approval". The system administrator only needs to add an "administrative approval" file classification to the file classification table. " classification data, you can maintain the corresponding classification of file information in the file directory and case file directory, and perform compilation and research operations on the classification; the system summarizes all kinds of file information through the entry of various file information , quickly retrieve the file according to the relevant information such as classification or file title, and quickly obtain the file from the corresponding physical storage location by checking the inventory location of the file, shortening the time for searching for the file and improving work efficiency.

实施例2:Example 2:

实现本实施例涉及的档案自动编研方法的装置,其主体结构由客户端浏览器1、档案信息录入与管理模块2、档案编研模块3和与其相链接的7个数据库电信息连通组成。其中,档案信息录入与管理模块2又包括卷内目录管理子模块4、案卷目录管理子模块5和专家登记卡信息子模块6三个功能模块单元;档案编研模块3由档案分类编研子模块7、文件字号索引编研子模块8和专家信息编研子模块9三个功能模块单元电信息连通组成;档案录入与管理模块2录入档案信息并对档案信息进行维护,对卷内目录、案卷目录和专家登记卡的档案信息进行分类和汇总维护;档案编研模块3根据用户输入的编研条件,自动实现档案分类编研、档案标题编研和归档时间编研,并以列表形式显示编研结果;文件字号索引编研子模块8根据用户的编研需求输入由公文字号、年度、档号和保管期限组合的编研条件,点击查询后能够根据编研条件过滤数据,以便自动生成并显示文件字号索引编研列表;专家信息编研子模块9对档案管理中的专家登记卡信息进行编研统计,根据输入的专家姓名,研究方向和成果的编研条件进行模糊查询,实现专家分类编研、专家研究方向编研、专家论文信息编研和专家项目信息编研,并将编研结果进行汇总后显示在屏幕上;其中专家分类编研是编研自动生成某一研究领域的所有专家信息编研结果列表,其编研结果均能够导出到Excel或Word文档中保存或打印输出。与档案信息录入与管理模块2和档案编研模块3电信息相链接的7个数据库分别为案卷目录数据库、卷内目录数据库、档案分类数据库、专家基本信息登记数据库、专家论文明细数据库、专家项目明细数据库和档案编研基础数据库;客户端浏览器1为联网的任意电脑和终端设备上运行的任意浏览器软件。The device for implementing the method for automatic file compilation and research involved in this embodiment has a main structure consisting of a client browser 1, a file information input and management module 2, a file compilation and research module 3, and 7 databases linked to it. Wherein, the file information entry and management module 2 includes three functional module units: the catalog management sub-module 4 in the volume, the file catalog management sub-module 5 and the expert registration card information sub-module 6; Module 7, file size index compilation and research sub-module 8 and expert information compilation and research sub-module 9 are composed of three functional module units connected by electronic information; file entry and management module 2 enters file information and maintains the file information. File directory and expert registration card file information are categorized and summarized; file editing and research module 3 automatically realizes file classification editing, file title editing and filing time editing and research according to user input editing conditions, and displays them in a list form Compilation and research results; file size index compilation and research sub-module 8 inputs the compilation and research conditions composed of official font size, year, file number and storage period according to the user's editing and research requirements, and after clicking query, the data can be filtered according to the compilation and research conditions, so as to automatically generate And display the file size index editing and research list; the expert information editing and research sub-module 9 compiles and researches the expert registration card information in the file management, and performs fuzzy query according to the input expert name, research direction and the compilation and research conditions of the results, so as to realize the expert Compilation and research of classification, compilation and research of expert research direction, compilation and research of expert paper information and compilation and research of expert project information, and the compilation and research results are summarized and displayed on the screen; among them, the compilation and research of expert classification is automatically generated by the compilation and research of a certain research field All expert information compilation and research results lists, and the compilation and research results can be exported to Excel or Word documents for saving or printing. The 7 databases linked with the archives information entry and management module 2 and the archives compilation and research module 3 are the file directory database, the file directory database, the file classification database, the expert basic information registration database, the expert paper detailed database, and the expert project Detailed database and basic database for file compilation and research; client browser 1 is any browser software running on any computer and terminal equipment connected to the Internet.

Claims (6)

1. a kind of archives are compiled and grind method automatically, it is characterised in that by archives in the MIS of Department Files based on B/S frameworks Typing is combined with management module and Compiling of Files module realizes that archives automatically grind by volume, and it is concretely comprised the following steps:
(1) archive information typing is first carried out, the archives list shown according to system, list in volume and expert's registration card interface are provided Unified form difference typing archives title, classification, shelves number, annual and all kinds of essential informations of expert's situation belonging to archives;
(2) step (1) is recorded using automatic hierarchical classification algorithm proposed by the present invention is automatic by archives typing and management module again The archive information entered is classified and collected automatically, and it is basic to be stored in corresponding archives catalog, list in volume and expert respectively Information is registered in catalog data base and expert database;
(3) compiling and researching condition and the information of deposit inputted again by Compiling of Files module according to user is in corresponding archives catalog, volume Catalogue registers catalog data base with expert's essential information and expert database is retrieved, inquired about and collected and generates Compiling of Files knot Really;
(4) Compiling of Files result is included on screen, or by Compiling of Files result in the form of Word document or Excel forms Printout forms paper document preservation after export, realizes that the automatic volume of archives is ground;
The implementation process of described automatic hierarchical classification algorithm is as follows:
(1) first in local or online input system archive information, the archives list shown according to system, list in volume and expert's registration card All kinds of essential informations of unified form difference typing archives title, affiliated classification, shelves number, year and expert that interface is provided;
(2) system automatically extracts the text data characteristic parameter collection of archives title and the keyword in archives text and is stored in phase In the database answered;
(3) dimensionality reduction is carried out when exceeding threshold value to the text data characteristic parameter collection of extraction, excessive feature frequently can lead to dimension Disaster, making the efficiency of classification reduces;
(4) rough sort of Naive Bayes Classification Algorithm is performed according to the text data characteristic parameter or keyword of extraction;
(5) feature extraction is carried out for each subclass respectively again in step (4) rough sort result;
(6) the text data characteristic parameter execution Naive Bayes Classification Algorithm again for each subclass is automatically performed disaggregated classification;
(7) output category result and it is saved in corresponding database;
The NB Algorithm refers to consider that all features of text classify to text during classification, will predicted during classification Sample is divided into according to predicting the outcome in particular document class probability highest class library, and its specific disaggregated model is as follows:It is given The archives text X of one unknown classification, provided with m classification, is designated as C1, C2 ... ..., Cm, according to Naive Bayes Classification law, The calculation formula of the classification P (Ci | X) with highest posterior probability is as follows under condition X:
<mrow> <mi>P</mi> <mrow> <mo>(</mo> <mi>C</mi> <mi>i</mi> <mo>|</mo> <mi>X</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <mi>X</mi> <mo>|</mo> <mi>C</mi> <mi>i</mi> <mo>)</mo> </mrow> <mi>P</mi> <mrow> <mo>(</mo> <mi>C</mi> <mi>i</mi> <mo>)</mo> </mrow> </mrow> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <mi>X</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow>
In P (Ci | X) calculation formula, P (X) is constant, therefore only needs to maximize molecule P (X | Ci) P (Ci);P (Ci) it is category distribution probability in training set, calculation formula is:Molecule is classification in formula | Ci | include Textual data add 1, denominator is m classification and | D | be the text sum sum included in training set;In order to simplify P (X | Ci) Calculating process, it is assumed that multiple attributes of text are independent of each other, therefore, and it is exactly to calculate that characteristic attribute exists to calculate P (X | Ci) The probability occurred on classification Ci, P (X | Ci) value is calculated using 2 kinds of computation models of Laplace estimation:
(1) whether multivariate model, statistical nature attribute occurred in the text, if occurring being designated as 1, was otherwise designated as 0, calculated public Formula is:
<mrow> <mi>P</mi> <mrow> <mo>(</mo> <mi>X</mi> <mo>|</mo> <msub> <mi>C</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&amp;Pi;</mo> <mrow> <mi>t</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mo>|</mo> <mi>v</mi> <mo>|</mo> </mrow> </munderover> <mrow> <mo>(</mo> <msub> <mi>B</mi> <mrow> <mi>x</mi> <mi>t</mi> </mrow> </msub> <mi>P</mi> <mo>(</mo> <mrow> <msub> <mi>w</mi> <mi>t</mi> </msub> <mo>|</mo> <msub> <mi>C</mi> <mi>i</mi> </msub> </mrow> <mo>)</mo> <mo>+</mo> <mo>(</mo> <mrow> <mn>1</mn> <mo>-</mo> <msub> <mi>B</mi> <mrow> <mi>x</mi> <mi>t</mi> </mrow> </msub> </mrow> <mo>)</mo> <mo>(</mo> <mrow> <mn>1</mn> <mo>-</mo> <mi>P</mi> <mrow> <mo>(</mo> <mrow> <msub> <mi>w</mi> <mi>t</mi> </msub> <mo>|</mo> <msub> <mi>C</mi> <mi>i</mi> </msub> </mrow> <mo>)</mo> </mrow> </mrow> <mo>)</mo> <mo>)</mo> </mrow> </mrow>
Wherein, | V | the total quantity of representation eigenvalue, BxtIt is wtThe mark occurred in text X, if wtThere is then BxtIt is designated as 1, it is no Then it is designated as 0, wtRepresent t-th of feature, i.e. t-th vectorial of component, therefore, the P (w in formulat|Ci) calculation formula is as follows:
(2) multinomial model (Multinomial Model) the then occurrence number of statistical nature attribute in the text, calculation formula For:
<mrow> <mi>P</mi> <mrow> <mo>(</mo> <mi>X</mi> <mo>|</mo> <msub> <mi>C</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&amp;Pi;</mo> <mrow> <mi>t</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mo>|</mo> <mi>v</mi> <mo>|</mo> </mrow> </munderover> <mfrac> <mrow> <mi>P</mi> <msup> <mrow> <mo>(</mo> <msub> <mi>w</mi> <mi>t</mi> </msub> <mo>|</mo> <msub> <mi>C</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <msub> <mi>N</mi> <mrow> <mi>x</mi> <mi>t</mi> </mrow> </msub> </msup> </mrow> <mrow> <msub> <mi>N</mi> <mrow> <mi>x</mi> <mi>t</mi> </mrow> </msub> <mo>!</mo> </mrow> </mfrac> </mrow>
Wherein, NxtRepresent the number of times that feature t occurs in text X;P(wt|Ci) calculation formula it is as follows:
<mrow> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>w</mi> <mi>t</mi> </msub> <mo>|</mo> <msub> <mi>C</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mn>1</mn> <mo>+</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mo>|</mo> <mi>D</mi> <mo>|</mo> </mrow> </munderover> <msub> <mi>N</mi> <mrow> <mi>j</mi> <mi>t</mi> </mrow> </msub> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>C</mi> <mi>i</mi> </msub> <mo>|</mo> <msub> <mi>d</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <mo>|</mo> <mi>V</mi> <mo>|</mo> <mo>+</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>s</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mo>|</mo> <mi>V</mi> <mo>|</mo> </mrow> </munderover> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mo>|</mo> <mi>D</mi> <mo>|</mo> </mrow> </munderover> <msub> <mi>N</mi> <mrow> <mi>j</mi> <mi>s</mi> </mrow> </msub> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>C</mi> <mi>i</mi> </msub> <mo>|</mo> <msub> <mi>d</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>;</mo> </mrow>
In P (wt|Ci) calculation formula in, NjtT is characterized in text djIn the number of times that once occurred, | D | for training text sum, | V | it is characterized sum, NjsIt is feature s in text djIn occurrence number;The essence of the sorting technique is to the institute in text object There is the probability that characteristic value is counted and is mapped in each already present classification.
2. archives according to claim 1 are compiled and grind method automatically, it is characterised in that the Compiling of Files resume module being related to Data message is for the archive information typing set up and archives list, list in volume and the expert's registration card mesh of management module The basic volume that record is carried out is ground, and being created in archive information typing and management module includes archives list database, list in volume number According to storehouse, classification of documents database, expert's essential information registered database, civilized thin and 6 data of expert's Item Detail of Specialists Storehouse;Compiling of Files module is compiled to grind submodule, file font size index volume and grind submodule and expert info and compile by the classification of documents grinds 3 sons Module is constituted, and the Compiling of Files basic database associated with above-mentioned 6 databases is created in Compiling of Files module;Archives point Class is compiled and grinds the compiling and researching condition that submodule is inputted according to user, realizes that classification of documents volume is ground, archives title is compiled when grinding and filing automatically Between compile grind and with tabular form show volume grind result;File font size index volume grinds submodule and grinds demand input bag according to the volume of user Official document font size, year, shelves number and the combination of retention period compiling and researching condition are included, according to condition filter data after click inquiry, so as to Automatically generate and show that file font size index volume grinds list;Expert info is compiled and grinds submodule to expert's registration card in file administration Information carries out volume and grinds statistics, and fuzzy query is carried out according to the compiling and researching condition of expert's name of input, research direction and achievement, realizes Expert classification volume is ground, expert's research direction volume is ground, expert's paper information is compiled to grind to compile with expert's project information and ground, and volume is ground into result It is shown in after being collected on screen;Wherein expert classification, which is compiled to grind to refer to compile, grinds all experts for automatically generating a certain research field Information is compiled and grinds the results list, and its volume, which grinds result, can export to preservation or printout in Excel or Word document.
3. archives according to claim 1 are compiled and grind method automatically, it is characterised in that the classification of documents being related to is compiled and grinds submodule Compiling of Files comprise the following steps:
(1) create classification volume and grind view, view is using list in volume or archives list as main table, and associative classification information table obtains classification Name of the information, list in volume is stored in from archives list information in different tables of data, is carrying out needing to collect when classification volume is ground Two parts information, carries out unified query and retrieves;
(2) data access layer identification code, retrieval needs to compile the archives letter ground during data access layer is the view provided from step (1) Breath, performs function using querying condition as parameter, retrieves qualified archive information, and classification volume grinds the extraction classification of documents, archives Title, time of filing, number, number of pages and shelves information, and result is ground according to specific name, distribution caption and time of filing to volume It is ranked up;
(3) classification volume grinds application layer realization, classification compiling and researching condition is set first, by the fuzzy search of classification of documents title, archives mark Topic fuzzy search and time of filing searching classification, which are compiled, to be ground;
(4) volume is ground into result to export in EXCEL or Word document, facilitates user's preservation to check and print and bind into book form.
4. archives according to claim 1 are compiled and grind method automatically, it is characterised in that the file font size index volume being related to grinds son The Compiling of Files of module comprises the following steps:
(1) establishment file font size indexed view, file font size indexed view is carried using list in volume as main table associative classification information table Take official document font size, year, file sequence number, shelves number, page number, number of pages and retention period information;
(2) data access layer identification code, data access layer is that information is extracted from file font size indexed view, and according to official document word Number, file sequence number and year sequence, function is using querying condition as parameter, by application layer dynamic construction;
(3) application layer main code, default compiling and researching condition, including official document font size, year, shelves number and retention period, according to Volume grinds needs and is combined input, clicks on after inquiry according to condition filter data;
(4) volume is ground into result to export in EXCEL or Word document, facilitates user's preservation to check and print and bind into book form.
5. archives according to claim 1 are compiled and grind method automatically, it is characterised in that the expert info being related to is compiled and grinds submodule Compiling of Files comprise the following steps:
(1) create expert info volume and grind view, view is using expert's essential information registration form as main table, the literary detail list of association Specialists Paper information is extracted, association expert item details extract project and winning information, the registration of expert's essential information and expert's achievement Information is stored in different tables of data, is needed to collect each several part information when row information is compiled and ground entering, is carried out unified query;
(2) data access layer identification code, data access layer is to obtain to need to compile the expert info ground from the view in step (1), Function inquires about qualified expert's archive information using querying condition as parameter, and volume is ground extraction expert name, expert's classification, ground Study carefully direction, paper information and project information;
(3) expert info, which is compiled, grinds application layer realization, sets compiling and researching condition, carries out the fuzzy search of expert's title, expert's research direction Fuzzy search, thesis topic summarizes fuzzy search with paper, and paper publishing time retrieval, project name fuzzy search, project is general State fuzzy search and the retrieval of project beginning and ending time and the prize-winning situation retrieval of project;Expert info is compiled to grind grinds bar according to the volume of input Part, realizes that expert classification volume is ground, expert's research direction volume is ground, expert's paper information is compiled to grind to compile with expert's project information and ground;
(4) volume is ground into result to export in Excel or Word document, facilitates user's preservation to check and print and bind into book form.
6. archives according to claim 1 are compiled and grind method automatically, it is characterised in that the MIS of Department Files being related to Performing flow is:
(1) a browser is opened in client, the station address of input system sends clothes to Web server in address field Business request, when showing the login page of system on desktop, is transmitted after login page fills in user name, password and identifying code To Web server, Web server passes the homepage of MIS of Department Files with http protocol after being verified to user identity Client is given, client browser receives the homepage file transmitted, and it is shown on screen;
(2) typing of archives essential information, according to system be shown in the archives list, list in volume and expert's registration card of homepage to Classification, shelves number, archives title, year and all kinds of essential informations of expert belonging to the unified form difference typing gone out and addition archives; System performs corresponding extension application in the Business Logic of Web server and is attached with database server, passes through The above-mentioned all kinds of essential informations of user's typing or addition are arrived the corresponding number being connected with Web server by SQL modes in storage Before storehouse, catalogue and archives title are carried out automatically classifying and filing by system, then original text is mounted;Original text can be electronics Scanned copy or electronic edition original paper;
(3) when needing to carry out Compiling of Files to certain class archive information, the entry for the Compiling of Files that system is selected according to user enters To corresponding Compiling of Files interface, compiling and researching condition is inputted in the interface and clicks on inquiry button, associated databases are believed Breath retrieval and inquiry;
(4) volume inputted according to step (3) grinds parameter, performed with Web server link after by SQL statement to corresponding Database server propose data processing request, i.e., to Compiling of Files basic database and be associated other database informations Retrieved and inquiry operation, and the data item that meets compiling and researching condition will be retrieved and counted, analyzed and collected, generate archives Volume is ground into fruit;
(5) result of the Compiling of Files of generation is submitted to Web server by database server, then is sent to by Web server Client is simultaneously shown on screen;
(6) volume is ground into fruit and exports to preservation or printout in Word document or Excel forms.
CN201410714594.2A 2014-11-28 2014-11-28 A kind of archives are compiled and grind method automatically Expired - Fee Related CN104361111B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410714594.2A CN104361111B (en) 2014-11-28 2014-11-28 A kind of archives are compiled and grind method automatically

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410714594.2A CN104361111B (en) 2014-11-28 2014-11-28 A kind of archives are compiled and grind method automatically

Publications (2)

Publication Number Publication Date
CN104361111A CN104361111A (en) 2015-02-18
CN104361111B true CN104361111B (en) 2017-10-27

Family

ID=52528371

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410714594.2A Expired - Fee Related CN104361111B (en) 2014-11-28 2014-11-28 A kind of archives are compiled and grind method automatically

Country Status (1)

Country Link
CN (1) CN104361111B (en)

Families Citing this family (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105303321A (en) * 2015-11-04 2016-02-03 广州赛莱拉干细胞科技股份有限公司 Archive management method and apparatus
CN105808770A (en) * 2016-03-22 2016-07-27 北京北方微电子基地设备工艺研究中心有限责任公司 File management method and device
CN106021355B (en) * 2016-05-10 2020-07-28 重庆大学 Statistical method and custom rule establishing method, device and system among multiple tables
CN106227749A (en) * 2016-07-14 2016-12-14 上海超橙科技有限公司 A kind of information-pushing method and equipment
CN106227748A (en) * 2016-07-14 2016-12-14 上海超橙科技有限公司 A kind of information generating method and equipment
CN106776695B (en) * 2016-11-11 2020-12-04 上海信联信息发展股份有限公司 Method for automatically identifying value of document and file
CN107463651A (en) * 2017-07-27 2017-12-12 合肥泓泉档案信息科技有限公司 A kind of electronic record is filed management method
CN107491498A (en) * 2017-07-27 2017-12-19 合肥泓泉档案信息科技有限公司 A kind of automatic adjusting method of dossier table
CN109684608A (en) * 2017-10-19 2019-04-26 航天信息股份有限公司 It is a kind of that the method and system of generation EXCEL document are passed through based on database
CN107894999A (en) * 2017-10-27 2018-04-10 成都准星云学科技有限公司 Towards the topic type automatic classification method and system based on thinking of solving a problem of elementary mathematics
CN107943957A (en) * 2017-11-27 2018-04-20 广西简约科技有限公司 A kind of software design approach for collecting meeting summary
CN108763467B (en) * 2018-05-29 2023-07-11 甘肃集优品网络科技有限公司 An electronic document intelligent processing management system suitable for the archives industry
CN109189730A (en) * 2018-09-21 2019-01-11 郑州云海信息技术有限公司 A kind of archives visual management method, system, device and readable storage medium storing program for executing
CN109766439A (en) * 2018-12-15 2019-05-17 内蒙航天动力机械测试所 The unlimited tree-shaped class definition and assigning method of statistical query software
CN112182138A (en) * 2019-07-03 2021-01-05 北京京东尚科信息技术有限公司 Method and device for cataloging
CN111597150B (en) * 2020-05-09 2023-09-12 云南驰宏锌锗股份有限公司 Automatic change and file arrangement information system
CN111858499A (en) * 2020-08-03 2020-10-30 王洋 File identification method, system and device based on black and white list
CN112463896B (en) * 2020-12-08 2024-02-23 常兰会 Archive catalogue data processing method, archive catalogue data processing device, computing equipment and storage medium
CN112861473B (en) * 2021-03-12 2024-02-02 国网浙江省电力有限公司物资分公司 Directory examination result summarizing system and method based on openpyl
CN113204610A (en) * 2021-05-06 2021-08-03 广东博维创远科技有限公司 Automatic cataloguing method based on criminal case electronic file and computer readable storage device
CN113407645B (en) * 2021-05-19 2024-06-11 福建福清核电有限公司 Intelligent sound image archive compiling and researching method based on knowledge graph
CN113220842B (en) * 2021-05-20 2022-04-19 广州中海云科技有限公司 Processing method, device and equipment for maritime affair administration punishment cutting template
CN113590903B (en) * 2021-09-27 2022-01-25 广东电网有限责任公司 Management method and device of information data
CN114706980A (en) * 2022-03-23 2022-07-05 胡美玲 A method for compiling data required for cumulative summary assessment
CN114947402A (en) * 2022-06-20 2022-08-30 国网山东省电力公司冠县供电公司 A file screening and classification processing device
CN115329086B (en) * 2022-08-29 2024-04-16 中铁四局集团电气化工程有限公司 Track traffic document retrieval system and method based on classification coding
CN115713179A (en) * 2022-11-17 2023-02-24 四川启睿克科技有限公司 Method for automatically registering financial archives
CN115730119A (en) * 2022-12-02 2023-03-03 深圳市雁联计算系统有限公司 A method, system, and related equipment for intelligent auxiliary compilation and research of archives
CN116757172B (en) * 2023-06-21 2024-07-23 山东浪潮科学研究院有限公司 File investigation method, device, equipment and storage medium
CN116501862B (en) * 2023-06-25 2023-09-12 桂林电子科技大学 Automatic text extraction system based on dynamic distributed collection
CN116595238B (en) * 2023-07-17 2023-09-19 三土电子有限公司 User archive data analysis processing method based on RFID technology
CN118673196B (en) * 2024-06-26 2025-07-08 河北元英信息技术有限公司 Digital archive intelligent management method and system based on big data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102368273A (en) * 2011-11-29 2012-03-07 神华集团有限责任公司 Archive management system and method
CN103745302A (en) * 2013-12-19 2014-04-23 镇江锐捷信息科技有限公司 Digitalized archival data management system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090132496A1 (en) * 2007-11-16 2009-05-21 Chen-Kun Chen System And Method For Technique Document Analysis, And Patent Analysis System

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102368273A (en) * 2011-11-29 2012-03-07 神华集团有限责任公司 Archive management system and method
CN103745302A (en) * 2013-12-19 2014-04-23 镇江锐捷信息科技有限公司 Digitalized archival data management system

Also Published As

Publication number Publication date
CN104361111A (en) 2015-02-18

Similar Documents

Publication Publication Date Title
CN104361111B (en) A kind of archives are compiled and grind method automatically
CN109992645B (en) Data management system and method based on text data
US9069853B2 (en) System and method of goal-oriented searching
US8176440B2 (en) System and method of presenting search results
US9977827B2 (en) System and methods of automatic query generation
US8583592B2 (en) System and methods of searching data sources
US9606970B2 (en) Web browser device for structured data extraction and sharing via a social network
US8965915B2 (en) Assisted query formation, validation, and result previewing in a database having a complex schema
CN112632405B (en) Recommendation method, recommendation device, recommendation equipment and storage medium
US8161030B2 (en) Method and system for aggregating reviews and searching within reviews for a product
CN104462306B (en) A kind of archives compile grinding device automatically
US20080243787A1 (en) System and method of presenting search results
US20060143158A1 (en) Method, system and graphical user interface for providing reviews for a product
CN110532309B (en) Generation method of college library user portrait system
CN110597981A (en) A Network News Summary System Using Multiple Strategies to Automatically Generate Summary
CN111192176B (en) An online data acquisition method and device supporting educational informatization evaluation
Irudeen et al. Big data solution for Sri Lankan development: A case study from travel and tourism
CN104199938B (en) Agricultural land method for sending information and system based on RSS
CN112800755A (en) Data management method and system
Spangler et al. A smarter process for sensing the information space
JP2008515061A (en) A method for searching data elements on the web using conceptual and contextual metadata search engines
CN110347922B (en) Recommendation method, device, equipment and storage medium based on similarity
CN113535966A (en) Knowledge graph creating method, information obtaining method, device and equipment
Rana et al. Analysis of web mining technology and their impact on semantic web
Pérez et al. Towards a data warehouse contextualized with web opinions

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20171027

CF01 Termination of patent right due to non-payment of annual fee