CN115309885A - A knowledge graph construction, retrieval and visualization method and system for scientific and technological services - Google Patents
A knowledge graph construction, retrieval and visualization method and system for scientific and technological services Download PDFInfo
- Publication number
- CN115309885A CN115309885A CN202211030854.5A CN202211030854A CN115309885A CN 115309885 A CN115309885 A CN 115309885A CN 202211030854 A CN202211030854 A CN 202211030854A CN 115309885 A CN115309885 A CN 115309885A
- Authority
- CN
- China
- Prior art keywords
- scientific
- technological
- knowledge
- retrieval
- entity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
- 
        - G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
 
- 
        - G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/34—Browsing; Visualisation therefor
 
- 
        - G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
 
- 
        - G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
 
- 
        - G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
 
- 
        - G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
 
- 
        - G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
 
- 
        - G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
 
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Animal Behavior & Ethology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域technical field
本发明涉及现代服务业领域,尤其涉及科技服务领域,具体是指一种用于科技服务的知识图谱构建、检索和可视化方法及系统。The present invention relates to the field of modern service industry, especially to the field of scientific and technological services, and specifically refers to a method and system for constructing, retrieving and visualizing knowledge graphs for scientific and technological services.
背景技术Background technique
科技服务业是现代服务业的重要组成部分之一,是一项为科技创新全链条提供服务的新兴产业。对比国外的科技服务行业,我国在该行业有着起步晚、发展速度快的特点,目前处于发展总体规模较小但增长速度快的阶段。The science and technology service industry is one of the important components of the modern service industry and an emerging industry that provides services for the entire chain of technological innovation. Compared with foreign science and technology service industries, our country has the characteristics of late start and fast development in this industry, and is currently in a stage of small overall development but fast growth.
知识图谱最早源于Google Knowledge Graph,它以结构化的形式描述客观世界中实体、事件、属性及其关系,将信息表达成更接近人类认知的形式。同时在数据存储方面,引申出的图数据库用简单直观的方式对数据进行建模和管理,可以更方便地将数据单元小型化、规范化,实现丰富的关系连接,更能明确地描述数据间的复杂关系。这一技术为我们提供了一种更好地组织、管理和理解海量信息的能力。The knowledge graph originated from Google Knowledge Graph, which describes entities, events, attributes and their relationships in the objective world in a structured form, and expresses information in a form closer to human cognition. At the same time, in terms of data storage, the extended graph database models and manages data in a simple and intuitive way, which can more easily miniaturize and standardize data units, realize rich relationship connections, and more clearly describe the relationship between data complicated relationship. This technology provides us with an ability to better organize, manage and understand vast amounts of information.
将知识图谱这一技术用于科技服务,可以实现将知识、行为、数据等多种信息融合增补,对科技服务资源进行知识图谱化建模,让智能检索服务做到精准化、场景化、个性化,能够有效地提高科技服务平台上的科技服务资源转化率。Applying the technology of knowledge graph to scientific and technological services can realize the integration and supplementation of various information such as knowledge, behavior, and data, and carry out knowledge graph modeling of scientific and technological service resources, so that intelligent retrieval services can be precise, scene-based, and personalized It can effectively improve the conversion rate of science and technology service resources on the science and technology service platform.
发明内容Contents of the invention
本发明的目的是克服了上述现有技术的缺点,提供了一种满足准确性高、灵活性好、资源转化率高的用于科技服务的知识图谱构建、检索和可视化方法及系统。The purpose of the present invention is to overcome the above-mentioned shortcomings of the prior art, and provide a knowledge map construction, retrieval and visualization method and system for scientific and technological services that meet the requirements of high accuracy, good flexibility, and high resource conversion rate.
为了实现上述目的,本发明的用于科技服务的知识图谱构建、检索和可视化方法及系统如下:In order to achieve the above purpose, the knowledge map construction, retrieval and visualization method and system for scientific and technological services of the present invention are as follows:
该用于科技服务的知识图谱构建、检索和可视化方法,其主要特点是,所述的方法包括以下步骤:The main feature of the method for constructing, retrieving and visualizing knowledge graphs for scientific and technological services is that the method includes the following steps:
(1)采集科技服务平台资源元数据并进行分词和词性标注的预处理;(1) Collect the resource metadata of the technology service platform and perform word segmentation and part-of-speech tagging preprocessing;
(2)对预处理后的元数据进行数据清洗和知识抽取,获取科技资源实体清单、实体属性清单和实体间关系清单;(2) Perform data cleaning and knowledge extraction on the preprocessed metadata, and obtain the entity list, entity attribute list and entity relationship list of scientific and technological resources;
(3)对科技服务资源实体进行知识矫正,确定可导入知识图谱的科技服务资源;(3) Perform knowledge correction on the scientific and technological service resource entity, and determine the scientific and technological service resources that can be imported into the knowledge map;
(4)对经筛选确定导入知识图谱的科技服务资源进行资源隶属度计算;(4) Carry out resource membership degree calculation for the scientific and technological service resources that have been screened and determined to be imported into the knowledge map;
(5)将处理过的科技服务资源导入知识图谱数据库中;(5) Import the processed scientific and technological service resources into the knowledge graph database;
(6)确定科技资源类目,设计用户检索时的精准化过滤条件;(6) Determine the category of scientific and technological resources, and design the precise filter conditions for user retrieval;
(7)响应用户个性化检索需求,在知识图谱数据库中完成查询操作;(7) In response to the user's personalized retrieval needs, complete the query operation in the knowledge graph database;
(8)可视化图谱形式呈现用户检索内容,提高用户检索体验。(8) The user search content is presented in the form of a visual graph to improve the user search experience.
优选的,所述科技服务平台为具有科技服务交易功能的科技服务资源共享平台,支持用户分角色注册登录,所述角色包括科技服务供应方、科技服务需求方。Preferably, the science and technology service platform is a science and technology service resource sharing platform with a technology service transaction function, which supports users to register and log in by roles, and the roles include technology service suppliers and technology service demanders.
优选的,所述科技服务平台资源元数据包括服务商、服务产品、仪器设备、园区服务、知识产权、投资和专家等。Preferably, the resource metadata of the science and technology service platform includes service providers, service products, instruments and equipment, park services, intellectual property rights, investment and experts, etc.
一种应用于所述科技服务知识图谱构建、检索和可视化方法的科技服务知识图谱构建、检索和可视化系统:包括数据采集及预处理模块、知识抽取模块、知识矫正模块、隶属度计算模块、图谱导入模块、检索资源分类模块、资源检索模块和可视化模块;A science and technology service knowledge map construction, retrieval and visualization system applied to the method of construction, retrieval and visualization of the science and technology service knowledge map: including a data collection and preprocessing module, a knowledge extraction module, a knowledge correction module, a membership degree calculation module, and a map Import module, retrieval resource classification module, resource retrieval module and visualization module;
所述数据采集及预处理模块用于采集科技服务平台的资源信息,并将其分为结构化数据、半结构化数据和非结构化数据,同时针对半结构化数据、非结构化数据进行分词和词性标注的预处理;The data collection and preprocessing module is used to collect the resource information of the technology service platform, and divide it into structured data, semi-structured data and unstructured data, and perform word segmentation for semi-structured data and unstructured data and preprocessing of part-of-speech tagging;
所述知识抽取模块用于对结构化数据进行数据清洗和遍历操作,对半结构化数据和非结构化数据进行知识抽取操作,具体包括实体识别、关系抽取和属性抽取;The knowledge extraction module is used to perform data cleaning and traversal operations on structured data, and perform knowledge extraction operations on semi-structured data and unstructured data, specifically including entity recognition, relationship extraction and attribute extraction;
所述知识矫正模块用于对清洗过后的科技资源进行知识矫正,确定可导入知识图谱数据库的完整科技资源信息;The knowledge correction module is used to correct the knowledge of the cleaned scientific and technological resources, and determine the complete scientific and technological resource information that can be imported into the knowledge graph database;
所述隶属度计算模块用于对经知识矫正后确定导入知识图谱的科技服务资源进行资源隶属度计算;The membership degree calculation module is used to calculate the resource membership degree of the scientific and technological service resources determined to be imported into the knowledge map after knowledge correction;
所述图谱导入模块用于对经过数据处理后的科技资源进行导入知识图谱数据库的操作;The map import module is used to import the scientific and technological resources after data processing into the knowledge map database;
所述检索资源分类模块用于对平台上科技资源进行分类操作,并确定用户检索时的过滤条件;The retrieval resource classification module is used to classify the scientific and technological resources on the platform, and determine the filter conditions for user retrieval;
所述资源检索模块用于响应用户检索请求并生成相应的查询语句、在知识图谱数据库中完成查询操作;The resource retrieval module is used to respond to user retrieval requests and generate corresponding query statements, and complete query operations in the knowledge graph database;
所述可视化模块用于在前端界面中以知识图谱可视化的形式呈现用户的检索结果。The visualization module is used to present the user's retrieval results in the form of knowledge map visualization in the front-end interface.
采用了本发明的用于科技服务的知识图谱构建、检索和可视化方法及系统,本发明将科技服务平台上的科技服务资源进行知识图谱化存储,相比较于传统的关系型数据库存储,其将结构化数据存储在网络上形成图谱而不是表中,极大的提高了科技资源存储效率。本发明将知识图谱中节点与关系架构的优势用于科技服务资源检索,通过该检索方式可准确地捕捉到用户个性化的需求,提高平台供需资源匹配度,提升平台科技服务资源交易成功率。Adopting the knowledge map construction, retrieval and visualization method and system for scientific and technological services of the present invention, the present invention stores the scientific and technological service resources on the scientific and technological service platform as a knowledge map. Compared with traditional relational database storage, it will Structured data is stored on the network to form a map instead of a table, which greatly improves the storage efficiency of scientific and technological resources. The invention uses the advantages of the node and relationship structure in the knowledge map for the retrieval of scientific and technological service resources. Through this retrieval method, the personalized needs of users can be accurately captured, the matching degree of supply and demand resources on the platform is improved, and the success rate of transaction of technological service resources on the platform is improved.
附图说明Description of drawings
图1为本发明的用于科技服务的知识图谱构建、检索和可视化方法的流程图。FIG. 1 is a flow chart of the method for constructing, retrieving and visualizing a knowledge map for scientific and technological services according to the present invention.
图2为本发明的用于科技服务的知识图谱构建、检索和可视化系统的结构示意图。Fig. 2 is a schematic structural diagram of the knowledge map construction, retrieval and visualization system for scientific and technological services of the present invention.
图3为本发明的用于科技服务的知识图谱构建、检索和可视化方法的简要流程图。Fig. 3 is a brief flow chart of the method for constructing, retrieving and visualizing knowledge graphs for scientific and technological services according to the present invention.
具体实施方式Detailed ways
为了能够更清楚地描述本发明的技术内容,下面结合具体实施例来进行进一步的描述。In order to describe the technical content of the present invention more clearly, further description will be given below in conjunction with specific embodiments.
本发明的该用于科技服务的知识图谱构建、检索和可视化方法,其中包括以下步骤:The knowledge map construction, retrieval and visualization method for scientific and technological services of the present invention includes the following steps:
(1)采集科技服务平台资源元数据并进行分词和词性标注的预处理;(1) Collect the resource metadata of the technology service platform and perform word segmentation and part-of-speech tagging preprocessing;
(2)对预处理后的元数据进行数据清洗和知识抽取,获取科技资源实体清单、实体属性清单和实体间关系清单;(2) Perform data cleaning and knowledge extraction on the preprocessed metadata, and obtain the entity list, entity attribute list and entity relationship list of scientific and technological resources;
(3)对科技服务资源实体进行知识矫正,确定可导入知识图谱的科技服务资源;(3) Perform knowledge correction on the scientific and technological service resource entity, and determine the scientific and technological service resources that can be imported into the knowledge map;
(4)对经筛选确定导入知识图谱的科技服务资源进行资源隶属度计算;(4) Carry out resource membership degree calculation for the scientific and technological service resources that have been screened and determined to be imported into the knowledge map;
(5)将处理过的科技服务资源导入知识图谱数据库中;(5) Import the processed scientific and technological service resources into the knowledge graph database;
(6)确定科技资源类目,设计用户检索时的精准化过滤条件;(6) Determine the category of scientific and technological resources, and design the precise filter conditions for user retrieval;
(7)响应用户个性化检索需求,在知识图谱数据库中完成查询操作;(7) In response to the user's personalized retrieval needs, complete the query operation in the knowledge graph database;
(8)可视化图谱形式呈现用户检索内容,提高用户检索体验。(8) The user search content is presented in the form of a visual graph to improve the user search experience.
优选的,所述科技服务平台为具有科技服务交易功能的科技服务资源共享平台,支持用户分角色注册登录,所述角色包括科技服务供应方、科技服务需求方。Preferably, the science and technology service platform is a science and technology service resource sharing platform with a technology service transaction function, which supports users to register and log in by roles, and the roles include technology service suppliers and technology service demanders.
优选的,所述科技服务平台资源元数据包括服务商、服务产品、仪器设备、园区服务、知识产权、投资和专家等。Preferably, the resource metadata of the science and technology service platform includes service providers, service products, instruments and equipment, park services, intellectual property rights, investment and experts, etc.
一种应用于所述科技服务知识图谱构建、检索和可视化方法的科技服务知识图谱构建、检索和可视化系统:包括数据采集及预处理模块、知识抽取模块、知识矫正模块、隶属度计算模块、图谱导入模块、检索资源分类模块、资源检索模块和可视化模块;A science and technology service knowledge map construction, retrieval and visualization system applied to the method of construction, retrieval and visualization of the science and technology service knowledge map: including a data collection and preprocessing module, a knowledge extraction module, a knowledge correction module, a membership degree calculation module, and a map Import module, retrieval resource classification module, resource retrieval module and visualization module;
所述数据采集及预处理模块用于采集科技服务平台的资源信息,并将其分为结构化数据、半结构化数据和非结构化数据,同时针对半结构化数据、非结构化数据进行分词和词性标注的预处理;The data collection and preprocessing module is used to collect the resource information of the technology service platform, and divide it into structured data, semi-structured data and unstructured data, and perform word segmentation for semi-structured data and unstructured data and preprocessing of part-of-speech tagging;
所述知识抽取模块用于对结构化数据进行数据清洗和遍历操作,对半结构化数据和非结构化数据进行知识抽取操作,具体包括实体识别、关系抽取和属性抽取;The knowledge extraction module is used to perform data cleaning and traversal operations on structured data, and perform knowledge extraction operations on semi-structured data and unstructured data, specifically including entity recognition, relationship extraction and attribute extraction;
所述知识矫正模块用于对清洗过后的科技资源进行知识矫正,确定可导入知识图谱数据库的完整科技资源信息;The knowledge correction module is used to correct the knowledge of the cleaned scientific and technological resources, and determine the complete scientific and technological resource information that can be imported into the knowledge graph database;
所述隶属度计算模块用于对经知识矫正后确定导入知识图谱的科技服务资源进行资源隶属度计算;The membership degree calculation module is used to calculate the resource membership degree of the scientific and technological service resources determined to be imported into the knowledge map after knowledge correction;
所述图谱导入模块用于对经过数据处理后的科技资源进行导入知识图谱数据库的操作;The map import module is used to import the scientific and technological resources after data processing into the knowledge map database;
所述检索资源分类模块用于对平台上科技资源进行分类操作,并确定用户检索时的过滤条件;The retrieval resource classification module is used to classify the scientific and technological resources on the platform, and determine the filter conditions for user retrieval;
所述资源检索模块用于响应用户检索请求并生成相应的查询语句、在知识图谱数据库中完成查询操作;The resource retrieval module is used to respond to user retrieval requests and generate corresponding query statements, and complete query operations in the knowledge graph database;
所述可视化模块用于在前端界面中以知识图谱可视化的形式呈现用户的检索结果。The visualization module is used to present the user's retrieval results in the form of knowledge map visualization in the front-end interface.
本发明的具体实施方式中,是一种用于科技服务的知识图谱构建、检索和可视化方法,请参阅图1和图3所示,本发明方法按以下步骤具体实现运行:In the specific embodiment of the present invention, it is a knowledge map construction, retrieval and visualization method for scientific and technological services, please refer to Figure 1 and Figure 3, the method of the present invention is implemented according to the following steps:
第一步,采集科技服务平台资源元数据并进行分词和词性标注的预处理。The first step is to collect the resource metadata of the technology service platform and perform word segmentation and part-of-speech tagging preprocessing.
从科技服务平台资源库中收集科技资源元数据,并将其分为结构化数据、半结构化数据、非结构化数据。结构化数据包括表格数据、关系型数据库等,半结构化数据包括日志文件、XML文档、JSON文档等,非结构化数据包括所有格式的办公文档、各类报表、文本等。Collect the metadata of scientific and technological resources from the resource library of the scientific and technological service platform, and divide it into structured data, semi-structured data, and unstructured data. Structured data includes tabular data, relational databases, etc. Semi-structured data includes log files, XML documents, JSON documents, etc. Unstructured data includes office documents in all formats, various reports, text, etc.
同时针对半结构化数据、非结构化数据进行分词和词性标注的预处理,分词和词性标注是完成知识抽取等复杂NLP任务的重要准备;At the same time, word segmentation and part-of-speech tagging are preprocessed for semi-structured and unstructured data. Word segmentation and part-of-speech tagging are important preparations for complex NLP tasks such as knowledge extraction;
区别于以英文代表的拉丁语系,其以字母作为词语的基本结构、以空格作为不同词语之间的分隔符,而在进行中文语义识别时首先应对中文语句进行汉字序列切分,将被切分语句划分为单独的词,才能让机器正确理解文本含义;Different from the Latin language represented by English, it uses letters as the basic structure of words and spaces as separators between different words. When performing Chinese semantic recognition, Chinese sentences should first be segmented into Chinese character sequences, which will be segmented Sentences are divided into separate words so that the machine can correctly understand the meaning of the text;
通过词性标注,为文本中各个句子中的每个词语确定其词性,词性的种类包括名词(/n)、动词(/v)、动名词(/vn)、形容词(/a)、副词(/d)等等;Through part-of-speech tagging, the part-of-speech is determined for each word in each sentence in the text. The types of part-of-speech include noun (/n), verb (/v), gerund (/vn), adjective (/a), adverb (/ d) etc.;
具体分词和词性标注步骤如下:The specific word segmentation and part-of-speech tagging steps are as follows:
步骤1.1:根据不同科技资源所属的行业类别,确定对应的行业语料词典,将待分词的文本和词典中的词条进行匹配;Step 1.1: Determine the corresponding industry corpus dictionary according to the industry category to which different scientific and technological resources belong, and match the text to be segmented with the entries in the dictionary;
步骤1.2:设置正向最长匹配的起始长度,将词典按照不同汉字长度进行划分,依照从前至后的顺序进行取词,每次匹配时搜索相对于汉字个数的词典;Step 1.2: Set the starting length of the longest forward match, divide the dictionary according to the length of different Chinese characters, and extract words in the order from front to back, and search the dictionary corresponding to the number of Chinese characters for each match;
步骤1.3:设置逆向最长匹配的起始长度,将词典按照不同汉字长度进行划分,依照从后至前的顺序进行取词,每次匹配时搜索相对于汉字个数的词典;Step 1.3: Set the starting length of the reverse longest match, divide the dictionary according to the length of different Chinese characters, and extract words according to the order from back to front, and search the dictionary corresponding to the number of Chinese characters for each match;
步骤1.4:确定择优规则,若正向最长匹配和逆向最长匹配分词后的词数相同,将非词典词和单字词最少的作为分词结果;Step 1.4: Determine the optimal selection rule. If the number of words after the forward longest match and the reverse longest match are the same, use the least non-dictionary words and single-character words as the word segmentation result;
步骤1.5:根据选择的行业语料词典进行词性标注,并设计特征模板;Step 1.5: Perform part-of-speech tagging according to the selected industry corpus dictionary, and design feature templates;
步骤1.6:创建空白词性标注器,以条件随机场模型进行相关参数设置并完成模型训练;Step 1.6: Create a blank part-of-speech tagger, use the conditional random field model to set relevant parameters and complete the model training;
步骤1.7:输入经过分词后的文本进行词性标注任务。Step 1.7: Input the word-segmented text for part-of-speech tagging task.
具体的,以对平台中某一项目简介为例进行数据预处理:“项目在深圳,针对学校、社区开发的互联网APP项目,现在处于研发阶段,股权融资300万,资金主要用于产品的研发,完善,跟后期当地市场的小范围推广,试行,未来再向全国发展。”Specifically, take the brief introduction of a certain project on the platform as an example for data preprocessing: "The project is in Shenzhen. The Internet APP project developed for schools and communities is currently in the research and development stage, and the equity financing is 3 million. The funds are mainly used for product research and development. , improve, follow the small-scale promotion in the local market in the later stage, try it out, and develop it to the whole country in the future.”
经分词与词性标注后得到结果如下:“[项目/n,在/p,深圳/ns,,/w,针对/p,学校/n,、/w,社区/n,开发/v,的/uj,互联网/n,APP/nx,项目/n,,/w,现在/t,处于/v,研发/j,阶段/n,,/w,股权/n,融资/vn,300万/m,,/w,资金/n,主要/b,用于/v,产品/n,的/uj,研发/j,,/w,完善/v,,/w,跟/p,后期/f,当地/s,市场/n,的/uj,小/a,范围/n,推广/v,,/w,试行/v,,/w,未来/t,再/d,向/p,全国/n,发展/vn,。/w]”。After word segmentation and part-of-speech tagging, the results are as follows: "[project/n, at/p, Shenzhen/ns,,/w, for/p, school/n,,/w, community/n, development/v, of/ uj, Internet/n, APP/nx, project/n,,/w, now/t, in/v, R&D/j, stage/n,,/w, equity/n, financing/vn, 3 million/m ,,/w, fund/n, main/b, used for/v, product/n, /uj, R&D/j,,/w, improvement/v,,,/w, following/p, later stage/f, Local /s, Market /n, /uj, Small /a, Range /n, Promotion /v, /w, Trial /v, /w, Future /t, Again /d, To /p, National / n, development /vn, ./w]".
第二步,对预处理后的元数据进行数据清洗和知识抽取,获取科技资源实体清单、实体属性清单和实体间关系清单。The second step is to perform data cleaning and knowledge extraction on the preprocessed metadata, and obtain the list of scientific and technological resources entities, entity attributes and relationships between entities.
针对第一步获得的结构化数据进行数据清洗和遍历操作,获取科技资源实体清单、实体属性清单和实体间关系清单。Perform data cleaning and traversal operations on the structured data obtained in the first step to obtain the list of scientific and technological resource entities, entity attribute lists, and entity-to-entity relationship lists.
针对第一步获得的半结构化数据和结构化数据进行知识抽取,确定命名实体、实体属性和实体间关系,进一步获取科技资源实体清单、实体属性清单和实体间关系清单。Perform knowledge extraction on the semi-structured data and structured data obtained in the first step, determine named entities, entity attributes, and inter-entity relationships, and further obtain the list of scientific and technological resource entities, entity attributes, and inter-entity relationships.
本发明中需要抽取出的实体为科技服务资源的相关命名实体,例如服务商、服务产品、仪器设备、园区、专利、投资机构、专家等等。抽取出的每个命名实体都可能被作为科技服务的相关实体被最终添加至知识图谱内。The entities that need to be extracted in the present invention are related named entities of scientific and technological service resources, such as service providers, service products, equipment, parks, patents, investment institutions, experts, and so on. Each extracted named entity may be finally added to the knowledge graph as a related entity of technology services.
在本发明中需要抽取的实体属性为不同科技服务资源的各种属性,例如专利发明人、专利申请人、专利代理机构,仪器设备所有机构、服务商提供的服务产品、专家所属单位等等,抽取的每个实体属性都可能作为科技服务的相关实体属性被最终添加至知识图谱内。The entity attributes that need to be extracted in the present invention are various attributes of different scientific and technological service resources, such as patent inventors, patent applicants, patent agencies, equipment owners, service products provided by service providers, experts' affiliated units, etc. Each extracted entity attribute may be finally added to the knowledge graph as a related entity attribute of the technology service.
在本发明中需要抽取的实体间关系为不同科技服务资源之间的各种关系,例如某科研机构对某专利的拥有,某专家对某技术的拥有,某投资商与某高校间合作关系、某服务商对某科技服务产品的提供等等,抽取的每个实体关系都可能作为科技服务的相关实体关系被最终添加至知识图谱内。The relationships between entities that need to be extracted in the present invention are various relationships between different scientific and technological service resources, such as the ownership of a patent by a scientific research institution, the ownership of a technology by an expert, the cooperative relationship between an investor and a university, When a service provider provides a technology service product, etc., each entity relationship extracted may be finally added to the knowledge graph as a related entity relationship of the technology service.
可选的,将实体属性视作实体与属性值之间的一种名词性关系,将属性抽取任务转化为关系抽取任务。Optionally, the entity attribute is regarded as a nominal relationship between the entity and the attribute value, and the attribute extraction task is transformed into a relation extraction task.
可选的,采用有监督的Lattice-LSTM中文命名识别算法模型实现对科技服务命名实体的抽取:Optionally, use the supervised Lattice-LSTM Chinese name recognition algorithm model to realize the extraction of technology service named entities:
步骤2.1:对需要抽取的科技资源数据进行人工标注,并对数据进行Embedding操作,提取文本上下文特征;Step 2.1: Manually label the scientific and technological resource data to be extracted, and perform Embedding operations on the data to extract text context features;
步骤2.2:使用词向量训练工具Word2Vec对预处理后的文本训练,构建文本字符对应的词向量;Step 2.2: Use the word vector training tool Word2Vec to train the preprocessed text and construct the word vector corresponding to the text characters;
步骤2.3:利用LSTM神经网络层计算获取上下文关系表示向量,最后将这些表示向量作为特征通过CRF标签推理层预测得出标签分类结果,“B”表示起始词,“E”表示结束词,“S”表示单个词;Step 2.3: Use the LSTM neural network layer to calculate and obtain the context representation vectors, and finally use these representation vectors as features to predict the label classification results through the CRF label reasoning layer, "B" indicates the start word, "E" indicates the end word, " S" means a single word;
步骤2.4:将70%标注语料作为训练集,30%的语料作为测试集,词向量的维度设为8logN,N为词表的大小,将LSTM模型的隐藏层神经元数量设置为150,基于Lattice-LSTM模型实现对实体的命名识别;Step 2.4: Use 70% of the labeled corpus as the training set and 30% of the corpus as the test set. The dimension of the word vector is set to 8logN, N is the size of the vocabulary, and the number of neurons in the hidden layer of the LSTM model is set to 150. Based on Lattice -LSTM model realizes the naming recognition of entities;
步骤2.5:抽取不同科技资源实体间关系,通过余弦相似度计算,获得关系与关系词之间的语义相似度,选择语义相似度高的关系词作为实体之间的关系;Step 2.5: Extract the relationship between entities of different scientific and technological resources, and obtain the semantic similarity between the relationship and the relationship word through cosine similarity calculation, and select the relationship word with high semantic similarity as the relationship between entities;
步骤2.6:在基于语义相似度提取关系的基础上,对关系进行人工核验,完善校正语言描述,规范关系的名称属性等特征。Step 2.6: On the basis of extracting the relationship based on semantic similarity, manually verify the relationship, improve and correct the language description, standardize the name attribute of the relationship and other features.
第三步,对科技服务资源实体进行知识矫正,确定可导入知识图谱的科技服务资源。The third step is to correct the knowledge of the technology service resource entity and determine the technology service resources that can be imported into the knowledge map.
从科技服务平台的大规模多源异构数据库中抽取到的科技资源数据必然存在数据重复,指代不清晰等特征,因此在构建知识图谱前对科技资源进行知识融合、指代消歧等工作是十分必要的。The scientific and technological resource data extracted from the large-scale multi-source heterogeneous database of the scientific and technological service platform must have characteristics such as data duplication and unclear reference. is very necessary.
在获取科技资源实体后,将不同来源的数据的相同实体的不同表达形式进行融合,完成实体识别,关系链接和本体生成,例如,专利号和公开号都相同的两个专利为重复专利,重复专利只保留一份。After obtaining the scientific and technological resource entity, the different expressions of the same entity of data from different sources are fused to complete entity recognition, relationship linking and ontology generation. For example, two patents with the same patent number and publication number are duplicate patents. Only one patent is reserved.
可选的,使用Limes工具来进行知识矫正工作,Limes是一个基于度量空间的实体匹配发现框架,适合于大规模数据链接,编程语言是Java,具体步骤如下:Optionally, use the Limes tool for knowledge correction. Limes is a metric space-based entity matching discovery framework, suitable for large-scale data links, and the programming language is Java. The specific steps are as follows:
步骤3.1:编写配置文件,包括数据源、融合算法、融合条件等信息;Step 3.1: Write a configuration file, including data source, fusion algorithm, fusion conditions and other information;
步骤3.2:给定源数据集S,目标数据集T,阈值θ;Step 3.2: Given a source dataset S, a target dataset T, and a threshold θ;
步骤3.3:计算s∈S和e∈E之间的距离m(s,e),利用三角不等式进行过滤,过滤掉m(s,e)-m(e,t)>θ的实体对(s,t);Step 3.3: Calculate the distance m(s, e) between s∈S and e∈E, use the triangle inequality to filter, and filter out the entity pairs (s ,t);
步骤3.4:计算剩余实体对(s,t)的距离m(s,t),存储为用户指定格式;Step 3.4: Calculate the distance m(s, t) of the remaining entity pair (s, t), and store it in a user-specified format;
步骤3.5:根据实体相似度完成知识融合,确定可导入知识图谱数据库的科技服务资源数据。Step 3.5: Complete knowledge fusion according to entity similarity, and determine the scientific and technological service resource data that can be imported into the knowledge graph database.
第四步,对经筛选确定导入知识图谱的科技服务资源进行资源隶属度计算。The fourth step is to calculate the resource membership degree for the scientific and technological service resources that have been screened and determined to be imported into the knowledge map.
本系统是基于科技资源共享服务平台展开,科技服务共享平台在传统平台的基础上汇聚中小企业发展所需的各类科技资源信息,具有数据来源广泛、数据标准杂乱、涉及领域繁多、的特点,为适应平台业务发展需求,围绕平台运营和统一资源调配,需对其整体的科技资源进行资源隶属度计算;This system is developed based on the scientific and technological resource sharing service platform. The scientific and technological service sharing platform gathers all kinds of scientific and technological resource information needed for the development of small and medium-sized enterprises on the basis of traditional platforms. It has the characteristics of extensive data sources, messy data standards, and various fields involved. In order to meet the needs of platform business development, centering on platform operation and unified resource allocation, it is necessary to calculate the resource membership degree of its overall scientific and technological resources;
以海南科创岛平台为例,海南科创岛包括现代服务业、海洋经济、国际医疗保健、文化旅游在内的四大行业,将平台中科技资源实体进行资源分类并计算其所属不同行业的隶属度,供平台离线资源分类使用和在线资源分类展示;Taking the Hainan Science and Technology Island platform as an example, Hainan Science and Technology Island includes four major industries including modern service industry, marine economy, international medical care, and cultural tourism. The degree of membership is used for the classification of offline resources on the platform and the classification and display of online resources;
为精准服务海南现代服务业、海洋经济、医疗保健、文化旅游为主导的开放生态服务型产业体系建设,服务自贸区现代经济体系高质量发展,通过平台五大发展模式:海南科技园区服务集中型、海南乡镇科技服务分散型、海南规上企业服务定制型、海南特色行业服务专业型、岛外科技服务资源导入型共同促进海南自贸区综合科技服务发展,将平台中科技资源实体进行资源分类并计算其所属不同发展模式的隶属度,供平台离线资源分类使用和在线资源分类展示;In order to accurately serve the construction of an open ecological service-oriented industrial system dominated by Hainan's modern service industry, marine economy, medical care, and cultural tourism, and to serve the high-quality development of the modern economic system in the Free Trade Zone, five major development models through the platform: Hainan Science and Technology Park Service Concentration , decentralized technology services for townships in Hainan, customized services for enterprises above the designated size in Hainan, professional services for Hainan's characteristic industries, and imported technology service resources from outside the island to jointly promote the development of comprehensive technology services in the Hainan Free Trade Zone, and classify the technology resource entities in the platform as resources And calculate the degree of membership of the different development models it belongs to, for the use of platform offline resource classification and online resource classification display;
以四大行业资源隶属度计算为例具体介绍资源隶属度计算过程:Taking the calculation of the resource membership degree of the four major industries as an example, the calculation process of the resource membership degree is introduced in detail:
步骤4.1:获取步骤一中得到的科技资源实体清单中各科技实体的分词和词性标注结果;Step 4.1: Obtain the word segmentation and part-of-speech tagging results of each scientific and technological entity in the scientific and technological resource entity list obtained in step 1;
步骤4.2:根据停用词词典对科技资源实体清单的分词结果进行筛选、切分处理;Step 4.2: Filter and segment the word segmentation results of the scientific and technological resource entity list according to the stop word dictionary;
步骤4.3:依托科创岛平台四大行业数据库中各科技资源词汇分类权重对各科技实体进行权重累加;Step 4.3: Accumulate the weights of each scientific and technological entity based on the classification weight of each scientific and technological resource vocabulary in the four major industry databases of the Science and Technology Island platform;
步骤4.4:计算原始数据的平均值和标准差,用原始数据减去平均值的差再除以标准差对数据进行归一化处理;Step 4.4: Calculate the mean and standard deviation of the original data, subtract the difference from the mean from the original data and then divide by the standard deviation to normalize the data;
步骤4.5:将计算结果作为四大行业隶属度属性与原科技实体建立关系;Step 4.5: Establish a relationship with the original scientific and technological entity with the calculation result as the membership degree attribute of the four major industries;
步骤4.6:对科技资源所属四大行业隶属度进行排序,指导科创岛平台后台数据调度和前台运营管理。Step 4.6: Sort the membership degree of the four major industries to which the scientific and technological resources belong, and guide the background data scheduling and front-end operation management of the Kechuang Island platform.
第五步,将处理过的科技服务资源导入知识图谱数据库中。The fifth step is to import the processed scientific and technological service resources into the knowledge graph database.
Neo4j是一个高性能的NOSQL图形数据库,它将结构化数据存储在网络上形成图谱而不是表中,也可以被看作是一个高性能的图引擎,该引擎具有成熟数据库的所有特性。Neo4j is a high-performance NOSQL graph database that stores structured data on the network to form a graph instead of a table. It can also be regarded as a high-performance graph engine that has all the characteristics of a mature database.
图数据库在一个数据单元中包含两种基本的数据类型:Nodes(节点)和Relationships(关系)。Nodes和Relationships包含key/value形式的属性,Nodes通过Relationships所定义的关系相连起来,形成关系型网络结构。A graph database contains two basic data types in a data unit: Nodes (nodes) and Relationships (relationships). Nodes and Relationships contain attributes in the form of key/value, and Nodes are connected through the relationships defined by Relationships to form a relational network structure.
可选的,使用Neo4j图数据库存储已经获取的科技资源实体以及实体间关系,依次遍历读取处理后的科技资源名称、属性及关系,运行Neo4j图数据库脚本创建科技资源实体并构建科技资源实体之间的关系。Optionally, use the Neo4j graph database to store the obtained scientific and technological resource entities and the relationship between entities, traverse and read the processed scientific and technological resource names, attributes and relationships in turn, run the Neo4j graph database script to create scientific and technological resource entities and build the relationship between scientific and technological resource entities relationship between.
具体导入Neo4j数据库的流程如下:The specific process of importing the Neo4j database is as follows:
步骤5.1:利用Excsl、MySql等工具将处理过的科技资源数据生成csv文件;Step 5.1: Use Excsl, MySql and other tools to generate csv files from the processed scientific and technological resource data;
步骤5.2:将csv文件放入Neo4j内import文件夹内,并使用Load语句在Neo4jDesktop中读入数据;Step 5.2: Put the csv file into the import folder in Neo4j, and use the Load statement to read the data in Neo4jDesktop;
步骤5.3:使用Create语句创建科技资源实体节点、实体属性以及实体间关系;Step 5.3: Use the Create statement to create technology resource entity nodes, entity attributes, and relationships between entities;
步骤5.4:针对数据量大小,选择需要的节点属性建立索引,提高大规模数据检索时的效率;Step 5.4: According to the size of the data, select the required node attributes to build an index to improve the efficiency of large-scale data retrieval;
步骤5.5:使用Match语句测试节点、关系是否导入成功。Step 5.5: Use the Match statement to test whether the nodes and relationships are imported successfully.
第六步,确定科技资源类目,设计用户检索时的精准化过滤条件。The sixth step is to determine the category of scientific and technological resources, and design the precise filter conditions for user retrieval.
在响应用户检索行为时,为了精细化确定用户需求,精准化将科技服务资源呈现给用户,依据科技服务平台中科技资源种类,划分两层检索类目:一级检索类目、二级检索类目;When responding to user search behavior, in order to finely determine user needs and accurately present scientific and technological service resources to users, according to the types of scientific and technological resources in the scientific and technological service platform, two levels of search categories are divided: first-level search categories and second-level search categories head;
根据导入图数据库中的科技资源实体名称,确定不同科技资源所属的一级检索类目内容和二级检索类目内容,具体检索类目涉及如下表所示:According to the name of the scientific and technological resource entity in the imported graph database, determine the content of the first-level search category and the second-level search category to which different scientific and technological resources belong. The specific search categories are as shown in the following table:
        
为了最大限度挖掘并满足用户个性化需求,基于知识图谱中节点-关系-节点架构,建立用户检索时的过滤条件,具体包括检索对象内容,与检索对象相关联的关系对象内容、检索对象与关系对象之间的关系,如下表所示:In order to maximize the excavation and meet the individual needs of users, based on the node-relationship-node architecture in the knowledge map, the filter conditions for user retrieval are established, including the content of the retrieval object, the content of the relationship object associated with the retrieval object, the retrieval object and the relationship The relationship between objects, as shown in the following table:
具体而言所述检索对象内容和关系对象内容包含子单元内容有:所属一级检索类目、所属二级检索类目、检索(关系)对象内容、检索(关系)对象属性,如下表所示:Specifically, the contents of the search object and the relationship object include subunit contents: the first-level search category, the second-level search category, the search (relationship) object content, and the search (relationship) object attributes, as shown in the following table :
        
第七步,响应用户个性化检索需求,在知识图谱数据库中完成查询操作。The seventh step is to respond to the user's personalized retrieval needs and complete the query operation in the knowledge graph database.
针对不同用户的个性化检索需求,利用知识图谱独有的复杂关系网络,可以清楚地探查到不同科技资源之间的关系,而将关系作为检索对象事实上达到了传统关系型数据库无法做到的检索效果。According to the personalized retrieval needs of different users, using the complex relational network unique to the knowledge graph, the relationship between different scientific and technological resources can be clearly detected, and the relationship as the retrieval object actually achieves what traditional relational databases cannot do. Retrieve the effect.
查询步骤主要思想如下:The main idea of the query step is as follows:
步骤7.1:用户根据需求确定科技服务资源检索对象内容及其一级、二级检索类目与检索对象内容属性;Step 7.1: The user determines the search object content of scientific and technological service resources and its first-level and second-level search categories and search object content attributes according to the needs;
步骤7.2:用户根据自己实际需求确定与检索内容相关联的其他科技服务资源检索内容,并确定与检索内容相关联的其他科技服务资源检索内容属性和一级、二级检索类目;Step 7.2: The user determines the search content of other scientific and technological service resources associated with the search content according to his actual needs, and determines the search content attributes and primary and secondary search categories of other scientific and technological service resources associated with the search content;
步骤7.3:后台响应用户检索条件并生成对应的Cypher查询语句;Step 7.3: The background responds to user retrieval conditions and generates corresponding Cypher query statements;
步骤7.4:利用生成的Cypher查询语句对Neo4j数据库进行检索,并返回检索结果。Step 7.4: Use the generated Cypher query statement to search the Neo4j database and return the search result.
下面以两个实例具体介绍用户检索过程:The following two examples specifically introduce the user retrieval process:
实例一:Example one:
用户需求能提供人脸识别门禁系统的上海地区服务商,且要求该服务商曾为上海大学提供过服务,根据此需求确定检索条件如下表:The user needs a service provider in Shanghai that can provide face recognition access control systems, and requires that the service provider has provided services for Shanghai University. According to this requirement, the search criteria are determined as follows:
后台接收到用户检索指令后,根据模板生成相应的Cypher语句:After the background receives the user's retrieval instruction, it generates the corresponding Cypher statement according to the template:
Match(n:服务商)-[:合作]-(p:高校)Match(n: service provider)-[:cooperation]-(p:university)
Wheren.name=’门禁系统’and n.location=’上海’and p.name=’上海大学’Wheren.name='Access Control System' and n.location='Shanghai' and p.name='Shanghai University'
Return n,Count(n);Return n, Count(n);
系统使用生成的Cypher语句实现对Neo4j数据库的检索,并返回检索到的结果。The system uses the generated Cypher statement to retrieve the Neo4j database and returns the retrieved results.
实例二:Example two:
用户拥有医疗内窥镜检测技术,现寻求投资1500万,要求投资商位于北京、曾与上海大学的科研机构有过合作项目、曾参与过医疗设备产业园区的建设,根据此需求确定检索条件如下表:The user has medical endoscope detection technology and is now seeking to invest 15 million. The investor is required to be located in Beijing, have had cooperation projects with scientific research institutions of Shanghai University, and have participated in the construction of a medical equipment industrial park. According to this demand, the search conditions are determined as follows surface:
后台接收到用户检索指令后,根据模板生成相应的Cypher语句:After the background receives the user's retrieval instruction, it generates the corresponding Cypher statement according to the template:
Match(p:科研机构)-[:合作]-(n:投资商)-[:提供]-(q:园区服务)Match(p: scientific research institution)-[:cooperation]-(n:investor)-[:provide]-(q:park service)
Where n.name=’医疗内窥镜’and n.location=’北京’and p.name=’上海大学’and q.name=’医疗设备’Where n.name='Medical Endoscope'and n.location='Beijing'and p.name='Shanghai University'and q.name='Medical Equipment'
Return n,Count(n);Return n, Count(n);
系统使用生成的Cypher语句实现对Neo4j数据库的检索,并返回检索到的结果。The system uses the generated Cypher statement to retrieve the Neo4j database and returns the retrieved results.
第八步,可视化图谱形式呈现用户检索内容,提高用户检索体验。The eighth step is to present user retrieval content in the form of a visual map to improve user retrieval experience.
通过所述第六步、第七步建立科技资源检索过滤条件,完成用户检索指令后,通过知识图谱可视化形式展示用户检索内容,更加准确地为用户推荐科技资源服务,来提高平台科技资源交易成功率。Through the sixth and seventh steps, the search and filter conditions for scientific and technological resources are established. After the user’s search instructions are completed, the content of the user’s search is displayed in a visual form of the knowledge map, and the service of scientific and technological resources is recommended to users more accurately, so as to improve the success of the transaction of technological resources on the platform. Rate.
在可视化图谱形式下,任何形式地检索都能够全方位地呈现检索到的科技资源对外的各种联系、以及不同科技资源之间的联系,这提供了传统文本、表格、图片所不能提供的洞察力,为用户提高决策效率。In the form of visual graphs, any form of retrieval can comprehensively present the various external connections of retrieved scientific and technological resources, as well as the connections between different scientific and technological resources, which provides insights that traditional texts, tables, and pictures cannot provide power to improve decision-making efficiency for users.
优选地,使用Neovis.js构造前端知识图谱可视化呈现内容:Preferably, use Neovis.js to construct a front-end knowledge graph to visualize and present content:
该组件将JavaScript可视化与Neo4j无缝集成,是可直连Neo4j的嵌入工具。This component seamlessly integrates JavaScript visualization with Neo4j, and is an embedded tool that can be directly connected to Neo4j.
该组件是在Neo4j的属性图模型的基础上构建的,因此Neovis.js数据格式和Neo4j保持一致,在单个配置对象中定义基于标签、属性、节点和关系自定义着色样式,并允许开发人员根据节点、关系或特定属性设置可视化样式。This component is built on the basis of Neo4j's property graph model, so the Neovis.js data format is consistent with Neo4j, and defines custom coloring styles based on labels, attributes, nodes, and relationships in a single configuration object, and allows developers to customize according to Set visual styles for nodes, relationships, or specific properties.
设置可视化部件步骤如下:The steps to set up the visual component are as follows:
步骤8.1:配置Neo4j服务器端口号及用户名、密码,连接到Neo4j数据库以获取实时数据;Step 8.1: Configure the Neo4j server port number, user name, and password, and connect to the Neo4j database to obtain real-time data;
步骤8.2:选择呈现可视化效果的DOM元素并设置可视化元素的样式(nodes和relationships);Step 8.2: Select the DOM element that presents the visualization effect and set the style of the visualization element (nodes and relationships);
步骤8.3:指定标签及要显示的属性,为节点的图像的URL指定节点属性,指定边缘厚度的边缘属性,指定节点大小的节点属性;Step 8.3: Specify the label and the attribute to be displayed, specify the node attribute for the URL of the image of the node, specify the edge attribute for the edge thickness, and specify the node attribute for the node size;
步骤8.4:响应用户检索指定,接收数据库返回的检索结果,构建前端可视化组件;Step 8.4: Respond to the user's search specification, receive the search results returned by the database, and build a front-end visualization component;
步骤8.5:动态交互用户指令,在前端实时响应用户操作。Step 8.5: Dynamically interact with user instructions, and respond to user operations in real time at the front end.
请参阅图2所述,本发明所述一种用于科技服务的知识图谱构建、检索和可视化系统,包括数据采集及预处理模块、知识抽取模块、知识矫正模块、隶属度计算模块、图谱导入模块、检索资源分类模块、资源检索模块和可视化模块。Please refer to Figure 2, a knowledge map construction, retrieval and visualization system for scientific and technological services according to the present invention, including data acquisition and preprocessing modules, knowledge extraction modules, knowledge correction modules, membership degree calculation modules, map import Module, Retrieval Resource Classification Module, Resource Retrieval Module and Visualization Module.
所述数据采集及预处理模块用于采集科技服务平台的资源信息,并将其分为结构化数据、半结构化数据和非结构化数据,同时针对半结构化数据、非结构化数据进行分词和词性标注的预处理;The data collection and preprocessing module is used to collect the resource information of the technology service platform, and divide it into structured data, semi-structured data and unstructured data, and perform word segmentation for semi-structured data and unstructured data and preprocessing of part-of-speech tagging;
所述知识抽取模块用于对结构化数据进行数据清洗和遍历操作,对半结构化数据和非结构化数据进行知识抽取,具体包括实体识别、关系抽取和属性抽取;The knowledge extraction module is used to perform data cleaning and traversal operations on structured data, and perform knowledge extraction on semi-structured data and unstructured data, specifically including entity recognition, relationship extraction and attribute extraction;
所述知识矫正模块用于对清洗过后的科技资源进行知识矫正,确定可导入知识图谱数据库的完整科技资源信息;The knowledge correction module is used to correct the knowledge of the cleaned scientific and technological resources, and determine the complete scientific and technological resource information that can be imported into the knowledge graph database;
所述隶属度计算模块用于对经知识矫正后确定导入知识图谱的科技服务资源进行资源隶属度计算;The membership degree calculation module is used to calculate the resource membership degree of the scientific and technological service resources determined to be imported into the knowledge map after knowledge correction;
所述图谱导入模块用于对经过数据处理后的科技资源进行导入知识图谱数据库的操作;The map import module is used to import the scientific and technological resources after data processing into the knowledge map database;
所述检索资源分类模块用于对平台上科技资源进行分类操作,并确定用户检索时的过滤条件;The retrieval resource classification module is used to classify the scientific and technological resources on the platform, and determine the filter conditions for user retrieval;
所述资源检索模块用于响应用户检索请求并生成相应的查询语句,在知识图谱数据库中完成查询操作;The resource retrieval module is used to respond to user retrieval requests and generate corresponding query statements, and complete query operations in the knowledge graph database;
所述可视化模块用于在前端界面中以知识图谱可视化的形式呈现用户的检索结果。The visualization module is used to present the user's retrieval results in the form of knowledge map visualization in the front-end interface.
本实施例的具体实现方案可以参见上述实施例中的相关说明,此处不再赘述。For the specific implementation solution of this embodiment, reference may be made to relevant descriptions in the foregoing embodiments, and details are not repeated here.
可以理解的是,上述各实施例中相同或相似部分可以相互参考,在一些实施例中未详细说明的内容可以参见其他实施例中相同或相似的内容。It can be understood that, the same or similar parts in the above embodiments can be referred to each other, and the content that is not described in detail in some embodiments can be referred to the same or similar content in other embodiments.
需要说明的是,在本发明的描述中,术语“第一”、“第二”等仅用于描述目的,而不能理解为指示或暗示相对重要性。此外,在本发明的描述中,除非另有说明,“多个”的含义是指至少两个。It should be noted that, in the description of the present invention, the terms "first", "second" and so on are only used for description purposes, and should not be understood as indicating or implying relative importance. In addition, in the description of the present invention, unless otherwise specified, the meaning of "plurality" means at least two.
流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更多个用于实现特定逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分,并且本发明的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本发明的实施例所属技术领域的技术人员所理解。Any process or method descriptions in flowcharts or otherwise described herein may be understood to represent modules, segments or portions of code comprising one or more executable instructions for implementing specific logical functions or steps of the process , and the scope of preferred embodiments of the invention includes alternative implementations in which functions may be performed out of the order shown or discussed, including substantially concurrently or in reverse order depending on the functions involved, which shall It is understood by those skilled in the art to which the embodiments of the present invention pertain.
应当理解,本发明的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行装置执行的软件或固件来实现。例如,如果用硬件来实现,和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。It should be understood that various parts of the present invention can be realized by hardware, software, firmware or their combination. In the above-described embodiments, various steps or methods may be implemented by software or firmware stored in a memory and executed by a suitable instruction execution device. For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or combination of the following techniques known in the art: Discrete logic circuits, ASICs with suitable combinational logic gates, programmable gate arrays (PGAs), field programmable gate arrays (FPGAs), etc.
本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成,相应的程序可以存储于一种计算机可读存储介质中,该程序在执行时,包括方法实施例的步骤之一或其组合。Those of ordinary skill in the art can understand that all or part of the steps carried by the method of the above-mentioned embodiments can be completed by instructing related hardware through a program, and the corresponding program can be stored in a computer-readable storage medium. When, one or a combination of the steps of the method embodiment is included.
此外,在本发明各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing module, each unit may exist separately physically, or two or more units may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules. If the integrated modules are realized in the form of software function modules and sold or used as independent products, they can also be stored in a computer-readable storage medium.
上述提到的存储介质可以是只读存储器,磁盘或光盘等。The storage medium mentioned above may be a read-only memory, a magnetic disk or an optical disk, and the like.
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。In the description of this specification, descriptions referring to the terms "one embodiment", "some embodiments", "example", "specific examples", or "some examples" mean that specific features described in connection with the embodiment or example , structure, material or feature is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
采用了本发明的用于科技服务的知识图谱构建、检索和可视化方法及系统,本发明将科技服务平台上的科技服务资源进行知识图谱化存储,相比较于传统的关系型数据库存储,其将结构化数据存储在网络上形成图谱而不是表中,极大的提高了科技资源存储效率。本发明将知识图谱中节点与关系架构的优势用于科技服务资源检索,通过该检索方式可准确地捕捉到用户个性化的需求,提高平台供需资源匹配度,提升平台科技服务资源交易成功率。Adopting the knowledge map construction, retrieval and visualization method and system for scientific and technological services of the present invention, the present invention stores the scientific and technological service resources on the scientific and technological service platform as a knowledge map. Compared with traditional relational database storage, it will Structured data is stored on the network to form a map instead of a table, which greatly improves the storage efficiency of scientific and technological resources. The invention uses the advantages of the node and relationship structure in the knowledge map for the retrieval of scientific and technological service resources. Through this retrieval method, the personalized needs of users can be accurately captured, the matching degree of supply and demand resources on the platform is improved, and the success rate of transaction of technological service resources on the platform is improved.
在此说明书中,本发明已参照其特定的实施例作了描述。但是,很显然仍可以作出各种修改和变换而不背离本发明的精神和范围。因此,说明书和附图应被认为是说明性的而非限制性的。In this specification, the invention has been described with reference to specific embodiments thereof. However, it is obvious that various modifications and changes can be made without departing from the spirit and scope of the invention. Accordingly, the specification and drawings are to be regarded as illustrative rather than restrictive.
Claims (6)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| CN202211030854.5A CN115309885A (en) | 2022-08-26 | 2022-08-26 | A knowledge graph construction, retrieval and visualization method and system for scientific and technological services | 
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| CN202211030854.5A CN115309885A (en) | 2022-08-26 | 2022-08-26 | A knowledge graph construction, retrieval and visualization method and system for scientific and technological services | 
Publications (1)
| Publication Number | Publication Date | 
|---|---|
| CN115309885A true CN115309885A (en) | 2022-11-08 | 
Family
ID=83864324
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date | 
|---|---|---|---|
| CN202211030854.5A Pending CN115309885A (en) | 2022-08-26 | 2022-08-26 | A knowledge graph construction, retrieval and visualization method and system for scientific and technological services | 
Country Status (1)
| Country | Link | 
|---|---|
| CN (1) | CN115309885A (en) | 
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| CN116127086A (en) * | 2022-11-23 | 2023-05-16 | 广东省国土资源测绘院 | Geographical science data demand analysis method and device based on scientific and technological literature resources | 
| CN116150377A (en) * | 2023-03-07 | 2023-05-23 | 北京神舟航天软件技术股份有限公司 | Digital service heterogeneous resource integration method based on atlas | 
| CN117668304A (en) * | 2023-10-11 | 2024-03-08 | 中国科学院空间应用工程与技术中心 | A data processing method, data processing system and computer-readable medium | 
| CN117786179A (en) * | 2023-11-07 | 2024-03-29 | 河南省科技创新促进中心 | Scientific research result retrieval method based on high-level talent key attribute | 
| CN118377913A (en) * | 2023-03-07 | 2024-07-23 | 数字扁担(浙江)科技有限公司 | Full-scene intelligent digital resource integration method and system | 
- 
        2022
        - 2022-08-26 CN CN202211030854.5A patent/CN115309885A/en active Pending
 
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| CN116127086A (en) * | 2022-11-23 | 2023-05-16 | 广东省国土资源测绘院 | Geographical science data demand analysis method and device based on scientific and technological literature resources | 
| CN116127086B (en) * | 2022-11-23 | 2023-09-19 | 广东省国土资源测绘院 | Geographical science data demand analysis method and device based on scientific and technological literature resources | 
| CN116150377A (en) * | 2023-03-07 | 2023-05-23 | 北京神舟航天软件技术股份有限公司 | Digital service heterogeneous resource integration method based on atlas | 
| CN118377913A (en) * | 2023-03-07 | 2024-07-23 | 数字扁担(浙江)科技有限公司 | Full-scene intelligent digital resource integration method and system | 
| CN117668304A (en) * | 2023-10-11 | 2024-03-08 | 中国科学院空间应用工程与技术中心 | A data processing method, data processing system and computer-readable medium | 
| CN117786179A (en) * | 2023-11-07 | 2024-03-29 | 河南省科技创新促进中心 | Scientific research result retrieval method based on high-level talent key attribute | 
| CN117786179B (en) * | 2023-11-07 | 2024-08-20 | 河南省科技创新促进中心 | Scientific research result retrieval method based on high-level talent key attribute | 
Similar Documents
| Publication | Publication Date | Title | 
|---|---|---|
| CN110825882B (en) | An information system management method based on knowledge graph | |
| CN115309885A (en) | A knowledge graph construction, retrieval and visualization method and system for scientific and technological services | |
| WO2021213314A1 (en) | Data processing method and device, and computer readable storage medium | |
| Yu et al. | Intelligent analysis system of college students' employment and entrepreneurship situation: Big data and artificial intelligence-driven approach | |
| CN106649223A (en) | Financial report automatic generation method based on natural language processing | |
| WO2020010834A1 (en) | Faq question and answer library generalization method, apparatus, and device | |
| Pekar et al. | Explainable text-based features in predictive models of crowdfunding campaigns | |
| Miao et al. | A dynamic financial knowledge graph based on reinforcement learning and transfer learning | |
| Paulus et al. | Recent advances and future challenges of semantic modeling | |
| Repke et al. | Extraction and representation of financial entities from text | |
| Wei et al. | LSTM-SN: complex text classifying with LSTM fusion social network | |
| Zhao et al. | A literature review of literature reviews in pattern analysis and machine intelligence | |
| US20200202074A1 (en) | Semsantic parsing | |
| Cao et al. | Occupational profiling driven by online job advertisements: Taking the data analysis and processing engineering technicians as an example | |
| Li et al. | A policy-based process mining framework: mining business policy texts for discovering process models | |
| Languré et al. | Breaking barriers in sentiment analysis and text emotion detection: toward a unified assessment framework | |
| Chau et al. | Computational legal studies comes of age | |
| An et al. | Globalizing the field by learning from non-English-based nonprofit studies: A review of south Korean nonprofit literature | |
| Rizvi et al. | Ace 2.0: A comprehensive tool for automatic extraction, analysis, and digital profiling of the researchers in scientific communities | |
| Baghbanzadeh | Job-Resume Compatibility Scoring Using Graph Neural Networks and Large Language Models | |
| Ali et al. | CLOE: a cross-lingual ontology enrichment using multi-agent architecture | |
| Alvarez-Rodriguez et al. | Enabling policy making processes by unifying and reconciling corporate names in public procurement data. The CORFU technique | |
| Pietranik et al. | A method for ontology alignment based on semantics of attributes | |
| Wang et al. | A two-stage unsupervised sentiment analysis method | |
| Dai | Construction of English and American literature corpus based on machine learning algorithm | 
Legal Events
| Date | Code | Title | Description | 
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |