[go: up one dir, main page]

CN113360599A - Multi-source heterogeneous information convergence cooperative processing platform based on content identification - Google Patents

Multi-source heterogeneous information convergence cooperative processing platform based on content identification Download PDF

Info

Publication number
CN113360599A
CN113360599A CN202110541644.1A CN202110541644A CN113360599A CN 113360599 A CN113360599 A CN 113360599A CN 202110541644 A CN202110541644 A CN 202110541644A CN 113360599 A CN113360599 A CN 113360599A
Authority
CN
China
Prior art keywords
data
service
layer
module
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110541644.1A
Other languages
Chinese (zh)
Inventor
付睿智
田苗
张建斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fu Zhizhi
Original Assignee
Suzhou Haisai Artificial Intelligence Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Haisai Artificial Intelligence Co ltd filed Critical Suzhou Haisai Artificial Intelligence Co ltd
Priority to CN202110541644.1A priority Critical patent/CN113360599A/en
Publication of CN113360599A publication Critical patent/CN113360599A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/319Inverted lists
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Library & Information Science (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a multi-source heterogeneous information convergence cooperative processing platform based on content identification, which comprises: basic environment layer, data resource layer, business processing layer and application service layer, basic environment layer includes: a hardware-supported environment and a software-supported environment, the hardware-supported environment comprising: distributed storage environments and distributed computing environments; the data resource layer includes: the map data, the business data, the full-text retrieval data, the unstructured data, the business processing intermediate data and the business processing result data, wherein the data resource layer provides a uniform data source and support for the business processing layer; the service processing layer comprises: the system comprises a multi-source data collection module, a preprocessing module and an automatic monitoring and warehousing module; the method can conveniently enable a large amount of data to have searching, analyzing and exploring capabilities, support multidimensional information inquiry of units, types, time, hot spots, keywords and the like, realize near-real-time full-text searching of documents, and effectively improve the efficiency of full-text retrieval of the data.

Description

Multi-source heterogeneous information convergence cooperative processing platform based on content identification
Technical Field
The invention relates to the technical field of information sharing and cooperation systems, in particular to a multi-source heterogeneous information convergence cooperative processing platform based on content identification.
Background
From international development, with the integration and development of high and new technologies such as artificial intelligence, big data, cloud computing and the like in military fields of various countries, the national defense science and technology information service is accelerating to promote the traditional document electronization, the heterogeneous data integration and the field knowledge association, and the future war form is gradually changed from informatization to intellectualization. The integrated development of intelligent technology and military intelligence brings great revolution to the strategy, organization, priority and resource allocation of developed countries such as the United states. Information work in the field of national defense is no longer the state of manual collection, processing and analysis in the past, and automation and intellectualization become necessary trends of information development.
At present, army information systems are built for many years, information data transmission and processing infrastructures are preliminarily constructed, massive information data are accumulated, types comprise formatted data, semi-formatted data and unformatted data, bearing forms comprise texts, data packets, pictures, videos, high-resolution images and the like, and data with different types, formats and structures are effectively integrated and processed without a unified platform. The centralized storage, efficient query and associated application of intelligence data are important issues to be solved urgently. The following problems are highlighted: firstly, the hardware environment is weak, and the increasing demand of data can not be satisfied. Secondly, the data is highly dispersed, and no associated application capability is formed. Thirdly, the data standard is not uniform, and a preprocessing method and technology are lacked. And fourthly, the deep mining is not enough, and the intelligence value of mass data is not exerted. Fifthly, the shared service capability is weak, and the diversified on-demand guarantee capability is not enough.
Based on the background, massive formatted, semi-formatted and unformatted information data are accumulated in the existing business system, the bearing form comprises texts, data packets, pictures, videos, high-resolution images and the like, and the data with different types, formats and structures are not effectively integrated and processed on a unified platform at present. The traditional integration mode still remains in the manual degree, most of the integration modes depend on manual identification and judgment of information processing personnel, including manual uploading and manual classification and warehousing, and urgent needs to be promoted in the aspects of data scale, timeliness, high efficiency and accuracy of data processing. On the other hand, according to the application condition of the currently accumulated mass information data, the highly dispersed information data has no associated application capability, the deep mining of the data is insufficient, the information value of the mass data is not exerted, and meanwhile, the information data with different formats is difficult to finish automatic classification and storage, and the data scale, quality and application level are all optimized, so that the efficiency of acquiring effective data by information staff is low.
Disclosure of Invention
The invention overcomes the defects of the prior art and provides a multi-source heterogeneous information convergence cooperative processing platform based on content identification.
In order to achieve the purpose, the invention adopts the technical scheme that: a multi-source heterogeneous information convergence cooperative processing platform based on content identification comprises: basic environment layer, data resource layer, business processing layer and application service layer, basic environment layer includes: a hardware-supported environment and a software-supported environment, the hardware-supported environment comprising: distributed storage environments and distributed computing environments.
The data resource layer includes: the map data, the service data, the full-text retrieval data, the unstructured data, the service processing intermediate data and the service processing result data, the data resource layer is used for managing and storing the intermediate data and the result data generated in the information processing process of the service processing layer, and the data resource layer provides a uniform data source and support for the service processing layer.
The service processing layer comprises: a plurality of base modules, a plurality of said base modules comprising: the system comprises a multi-source data collection module, a preprocessing module and an automatic monitoring and warehousing module; the application service layer is used for providing full-text retrieval and intelligence data classification display on the basis of business processing.
In a preferred embodiment of the present invention, the service processing layer further includes: the system comprises a backup module, a file moving module, a preprocessing module, a system core module and an extraction module.
In a preferred embodiment of the present invention, the software support environment comprises: MySQL database, search Elasticissearch engine, Java/Python development environment, and Docker application container engine.
In a preferred embodiment of the present invention, the method comprises: a server side and a Web client side which are connected through signals,
the server side comprises: an access server, which is respectively connected with a file storage server and a database server, both of which are connected with an application server,
the application server is respectively connected with the map server and the Web server, the application server is also connected with the full text retrieval server, and the map server provides a map engine, map data and map network configuration;
the Web client includes: the Web client is used for information display, browsing and auditing and system management.
In a preferred embodiment of the invention, the access server provides a multi-source data access service, adapts to different data sources, and converts and extracts data information.
In a preferred embodiment of the present invention, the file storage server provides a distributed storage service for storing files and pictures.
In a preferred embodiment of the present invention, the database server manages core service data, and implements data backup and data recovery.
In a preferred embodiment of the present invention, the application server is configured to provide core service management and control services, configure service plug-ins and service modules, and provide interfaces.
The second technical scheme of the invention is that the method comprises the following steps:
step S1: the automatic monitoring and warehousing module monitors a new file input by a message data source, and provides disaster-tolerant backup of data by using the backup module so as to ensure that original and finished data cannot be lost under an extreme environment;
step S2: copying or moving the new file to a corresponding working directory through a file moving module, preprocessing the new file by a preprocessing module according to a file format, and transmitting the processed data to an extracting module;
step S3: analyzing the data in the new file through a system core module, and performing warehousing operation on the extracted specific information so as to be called and displayed conveniently; and simultaneously, moving the file to a file storage directory for calling by the front end and the back end.
In a preferred embodiment of the present invention, the preprocessing module in step S2 performs data modeling and knowledge generation, and constructs a knowledge base oriented to business, so as to form data processing rules.
The invention solves the defects in the background technology, and has the following beneficial effects:
(1) the invention can conveniently enable a large amount of data to have searching, analyzing and exploring capabilities, support multidimensional information inquiry of units, types, time, hot spots, keywords and the like, simultaneously support title and full text retrieval, carry out content-based intelligent analysis on the unstructured data in a warehouse, realize full text retrieval of all data of a platform, and can quickly and accurately position and search according to the conditions of titles, texts, incoming telegram units, receiving time and the like. The Elasticissearch can store data in the form of JSON documents, and the data structure of the inverted index used by the Elasticissearch can list each unique word appearing in all documents, and can find all documents containing each word, so that full-text search can be performed on the documents in near real time, and the efficiency of full-text retrieval of data is effectively improved.
(2) The invention carries out document preprocessing according to the predefined document type, provides strong character recognition preprocessing capability aiming at the picture document, firstly carries out character recognition on the picture document, then carries out automatic and processing aiming at the recognized document content, refines the process of the whole system for document calibration, ensures the uniformity and the integrity of the document at the document layer, ensures that the system process can not be repeatedly used, realizes the function similar to document formatting, ensures that a data extraction module can still normally work, normally extracts important information, normally stores in a warehouse and the like under the condition of not changing any important configuration at the data layer.
(3) The invention supports automatic classification of heterogeneous unstructured information data such as text, picture, video, voice and the like, comprehensively utilizes the picture character recognition technology based on deep learning, the image recognition technology, voice recognition, natural voice processing, the multi-mode deep learning classification algorithm based on semi-supervision and the like to realize automatic classification and grading of information, and improves the efficiency of information reading.
(4) The invention adopts the Elasticissearch search engine as the middleware to index and create the data which is put into the database, thereby ensuring the high efficiency of full-text retrieval. The Elasticissearch is a distributed, high-expansion and high-real-time search and data analysis engine. The method can conveniently enable a large amount of data to have the capability of searching, analyzing and exploring, support the multi-dimensional information inquiry of units, types, time, hot spots, keywords and the like, and simultaneously support the title and full-text retrieval. The Elasticissearch can store data in the form of JSON documents, and the data structure of the inverted index used by the Elasticissearch can list each unique word appearing in all documents, and can find all documents containing each word, so that full-text search can be performed on the documents in near real time, and the efficiency of full-text retrieval of data is effectively improved.
(5) The invention carries out correlation analysis on the information by using data mining and big data analysis technologies, automatically extracts effective information data, automatically summarizes and analyzes the effective information data, and visually displays the information and situation information in a rich front-end visual chart mode, thereby realizing information correlation analysis and visual analysis of the information. The information extraction technology based on the knowledge graph, the big data analysis technology, the data mining technology and other technologies are comprehensively used for carrying out correlation analysis and visual analysis on the information, and effective information such as entities, relations, events and the like in the information is extracted.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a general architecture diagram of a system in accordance with a preferred embodiment of the present invention;
FIG. 2 is a diagram of a system network topology structure in accordance with a preferred embodiment of the present invention;
FIG. 3 is a system component diagram of the preferred embodiment of the present invention;
FIG. 4 is a schematic diagram of the document identification operation of the preferred embodiment of the present invention;
FIG. 5 is a schematic diagram of a full text search process according to a preferred embodiment of the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.
As shown in fig. 1, a multi-source heterogeneous information convergence cooperative processing platform based on content identification includes: basic environment layer, data resource layer, business processing layer and application service layer, basic environment layer includes: a hardware-supported environment and a software-supported environment, the hardware-supported environment comprising: distributed storage environments and distributed computing environments.
The data resource layer includes: the map data, the service data, the full-text retrieval data, the unstructured data, the service processing intermediate data and the service processing result data, the data resource layer is used for managing and storing the intermediate data and the result data generated in the information processing process of the service processing layer, and the data resource layer provides a uniform data source and support for the service processing layer.
In a preferred embodiment of the present invention, the service processing layer includes: a plurality of base modules, a plurality of said base modules comprising: the system comprises a multi-source data collection module, a preprocessing module and an automatic monitoring and warehousing module; dynamic information data are accessed in real time through a multi-source data collection module, and various information achievements analyzed and researched in recent years and information reported by troops at all levels are integrated. Through the preprocessing module, extraction, cleaning, comparison, analysis and audit are carried out, and information data are screened and intelligently analyzed based on contents to form standardized data.
In a preferred embodiment of the invention, the document preprocessing is carried out according to the predefined document type, the process of the whole system for document calibration is refined, the uniformity and the integrity of the document are ensured at the document layer, the system process is not repeatedly used, the function similar to document formatting is realized, and the data extraction module can still normally work, normally extract important information, normally store in a warehouse and the like at the data layer under the condition of not changing any important configuration. In addition, the system provides strong character recognition preprocessing capability for the picture document, firstly performs character recognition on the picture document, and then performs automatic processing on the recognized document content.
In a preferred embodiment of the present invention, the service processing layer further includes: the system comprises a backup module, a file moving module, a system core module and an extraction module.
In a preferred embodiment of the present invention, the application service layer is used for providing full text retrieval and intelligence data classification display on the basis of business processing. The automatic monitoring and warehousing module is used for automatically monitoring and warehousing data, monitoring unstructured data in a designated folder in real time, and realizing the real-time automatic storage of documents through steps of 'content analysis', 'automatic warehousing' and the like; and providing a classification algorithm, realizing the detailed classification of the message data based on content analysis, and automatically classifying and displaying. The method supports automatic classification of heterogeneous unstructured information data such as texts, pictures, videos and voices, and comprehensively utilizes technologies such as picture character recognition technology based on deep learning, image recognition technology, voice recognition, natural voice processing and multi-mode deep learning classification algorithm based on semi-supervision to realize automatic classification and grading of the information.
The software support environment includes: MySQL database, search Elasticissearch engine, Java/Python development environment and Docker application container engine; an Elasticissearch search engine is used as a middleware to index and create the data which is put into a database, so that the high efficiency of full-text retrieval is ensured. The Elasticissearch is a distributed, high-expansion and high-real-time search and data analysis engine.
On the other hand, the invention can conveniently enable a large amount of data to have searching, analyzing and exploring capabilities, support multidimensional information inquiry of units, types, time, hot spots, keywords and the like, simultaneously support title and full text retrieval, carry out content-based intelligent analysis on the stored unstructured data, realize full text retrieval of all data of a platform, and can quickly and accurately position and search according to the conditions of titles, texts, incoming telegram units, text receiving time and the like. The Elasticissearch can store data in the form of JSON documents, and the data structure of the inverted index used by the Elasticissearch can list each unique word appearing in all documents, and can find all documents containing each word, so that full-text search can be performed on the documents in near real time, and the efficiency of full-text retrieval of data is improved.
As shown in fig. 2, in a preferred embodiment of the present invention, the network topology includes: the server side and the Web client side are connected through signals, and the server side comprises: the access server provides multi-source data access service, adapts to different data sources, and converts and extracts data information, the access server is respectively connected with the file storage server and the database server, the file storage server provides distributed storage service for storing files and pictures, the database server manages core service data and realizes data backup and data recovery, and the file storage server and the database server are both connected with the application server. The application server is respectively connected with the map server and the Web server, the application server is used for providing core business management and control service, configuring a service plug-in, a business module and providing an interface, the application server is also connected with the full text retrieval server, and the map server provides a map engine, map data and map network configuration; the Web client includes: the Web client is used for information display, browsing and auditing and system management.
In a preferred embodiment of the present invention, the method for recognizing characters in an OCR picture includes the following steps:
step S1: and (3) preprocessing the picture, namely denoising the picture, and detecting and correcting the picture needing to be rotated.
Step S2: and text positioning, namely positioning the area with characters in the picture and finding out a boundary box of a word or a text line.
Step S3: and (4) character recognition, namely recognizing the positioned characters, and combining the step S1 with the step S2 to obtain the end-to-end detection of the characters.
The method uses context related information, recognizes Chinese character texts through vocabularies, takes a Chinese character sequence with a deterministic boundary as a processing unit, uses word co-occurrence probability obtained through statistics, adopts a dynamic programming method, calls a plurality of predefined processing sets in a plurality of processing units, and processes target information.
As shown in fig. 5, in a preferred embodiment of the present invention, a full text search method comprises the following steps:
step S1: the indexing process comprises the steps of collecting source data from a relational database, the Internet and a file system, collecting the source data to a unified place, creating an index into an index database, extracting key information from the source database, and extracting a word from the key information, wherein the word is associated with the source data; namely, when the index is created, the word is related to the source data, the association is recorded in the index database, and if the word is found, the source data is found;
step S2: in the searching process, a user executes searching and searching to compile a query keyword, searches an index from an index database, searches a word in the index database according to the query keyword, and finally displays the searching result.
By the full-text retrieval method, the labor of people can be effectively reduced, and the processing efficiency is effectively improved. Meanwhile, the invention uses distributed real-time search, TB-level data can return a search result in millisecond level, the query range is effectively reduced, and by using the Chinese word segmentation plug-in of the elastic search, accurate word segmentation can be realized, and the search efficiency is improved.
In a preferred embodiment of the invention, the data mining and big data analysis technology is used for carrying out correlation analysis on the information, extracting the message information and carrying out statistical analysis on the chart, thereby realizing the statistical chart display of patrol analysis, cross-line scouting analysis and airplane patrol analysis, automatically extracting effective information data, automatically summarizing and analyzing, carrying out visual display on the information and situation information in a rich front-end visual chart mode, and realizing the information correlation analysis and visual analysis of the information. The information extraction technology based on the knowledge graph, the big data analysis technology, the data mining technology and other technologies are comprehensively used for carrying out correlation analysis and visual analysis on the information, and effective information such as entities, relations, events and the like in the information is extracted.
As shown in fig. 3, in a preferred embodiment of the present invention, a CPU + GPU-based computing architecture is adopted, a character recognition engine, a distributed file storage, a relational database, and the like are used as basic components, and based on technologies such as a distributed aggregation search engine, a mass data intelligent analysis, a machine learning, and the like, functions such as multi-source data acquisition, data conversion processing, data sorting, data reporting, data query, and the like are realized. Meanwhile, core service scenes and functions of the platform are realized through a hierarchical structure of infrastructure management, data storage, application components and service interfaces, and the platform has an application system environment running across operating systems and platforms.
In a preferred embodiment of the present invention, the comprehensive test results are compared with the third party commercial document management system as follows:
Figure BDA0003071824010000091
Figure BDA0003071824010000101
TABLE 1 comparison of the results of the comprehensive testing with the third-party commercial document management System
As shown in the table above, the invention comprehensively utilizes the technologies of picture character recognition technology based on deep learning, image recognition technology, voice recognition, natural language processing, multi-mode deep learning classification algorithm based on semi-supervision and the like to realize the automatic classification and grading of the heterogeneous unstructured information data such as texts, pictures, videos, voices and the like. The method can standardize the method in the data processing and obtaining process in the service system, and realize synchronous promotion of data scale, data processing efficiency, data quality and application level, thereby promoting the efficiency and capability of obtaining effective data by intelligence personnel and well meeting the current and future requirements of organizations.
In a preferred embodiment of the invention, the data mining and big data analysis technology is used for carrying out correlation analysis on the information, effective information data is automatically extracted, the effective information data is automatically collected and analyzed, information and situation information are visually displayed in a rich front-end visual chart mode, and accurate data service and intelligent decision support in a data explosion environment are provided for a user.
The invention is designed aiming at the characteristics of massive formatted, semi-formatted and unformatted information data in the existing business system, adopts the technologies of content analysis, OCR picture character recognition, data mining, big data analysis, full text retrieval and the like, and solves the main problems of high dispersion of storage of various information data, low retrieval efficiency, loose association application, low guarantee benefit and the like. The system and the method realize automatic grading, classifying and warehousing of formatted, semi-formatted and unformatted information data, automatic index creation and storage, support functions of information query and retrieval in various modes, information summarization and correlation analysis, visual display of statistical and analysis results and the like, and effectively improve the information service guarantee level.
The invention can be fused with the existing service system, fully utilizes the existing information data resources, standardizes the methods in the data processing and obtaining processes, excavates and releases the potential value of the data resources, and realizes the synchronous promotion of the data scale, the data processing efficiency, the data quality and the application level, thereby promoting the efficiency and the capability of obtaining effective data by the information personnel, and having very high practical value and application prospect in the army.
As shown in fig. 4, when the present invention works, the automatic monitoring and warehousing module monitors a new file input by a message data source, and the backup module provides disaster recovery backup of data to ensure that original and finished data are not lost in an extreme environment; the new file is copied or moved to the corresponding working directory through the file moving module, meanwhile, the preprocessing module preprocesses the new file according to the file format, the preprocessing module carries out data modeling and knowledge generation, a service-oriented knowledge base is constructed, a data processing rule is formed, and the processed data are transmitted to the extracting module. Analyzing the data in the new file through a system core module, and performing warehousing operation on the extracted specific information so as to be called and displayed conveniently; and simultaneously, moving the file to a file storage directory for calling by the front end and the back end.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims (10)

1. A multi-source heterogeneous information convergence cooperative processing platform based on content identification comprises: a basic environment layer, a data resource layer, a business processing layer and an application service layer, which are characterized in that,
the base environment layer includes: a hardware-supported environment and a software-supported environment, the hardware-supported environment comprising: distributed storage environments and distributed computing environments;
the data resource layer includes: the map data, the service data, the full-text retrieval data, the unstructured data, the service processing intermediate data and the service processing result data are stored in the data resource layer, the data resource layer is used for managing and storing the intermediate data and the result data generated in the information processing process of the service processing layer, and the data resource layer provides a uniform data source and support for the service processing layer;
the service processing layer comprises: a plurality of base modules, a plurality of said base modules comprising: the system comprises a multi-source data collection module, a preprocessing module and an automatic monitoring and warehousing module;
the application service layer is used for providing full-text retrieval and intelligence data classification display on the basis of business processing.
2. The multi-source heterogeneous information convergence cooperative processing platform based on content identification as claimed in claim 1, wherein: the service processing layer further comprises: the system comprises a backup module, a file moving module, a preprocessing module, a system core module and an extraction module.
3. The multi-source heterogeneous information convergence cooperative processing platform based on content identification as claimed in claim 1, wherein: the software support environment includes: MySQL database, search Elasticissearch engine, Java/Python development environment, and Docker application container engine.
4. The network topology of the multisource heterogeneous intelligence convergence cooperative processing platform based on content identification as claimed in claim 1, comprising: a server side and a Web client side which are connected through signals,
the server side comprises: an access server, which is respectively connected with a file storage server and a database server, both of which are connected with an application server,
the application server is respectively connected with the map server and the Web server, the application server is also connected with the full text retrieval server, and the map server provides a map engine, map data and map network configuration;
the Web client includes: the Web client is used for information display, browsing and auditing and system management.
5. The multi-source heterogeneous intelligence convergence collaborative processing platform based on content identification as claimed in claim 4, wherein: the access server provides multi-source data access service, adapts to different data sources, and converts and extracts data information.
6. The multi-source heterogeneous intelligence convergence collaborative processing platform based on content identification as claimed in claim 4, wherein: the file storage server provides distributed storage service for storing files and pictures.
7. The multi-source heterogeneous intelligence convergence collaborative processing platform based on content identification as claimed in claim 4, wherein: the database server manages the core service data, and realizes data backup and data recovery.
8. The multi-source heterogeneous intelligence convergence collaborative processing platform based on content identification as claimed in claim 4, wherein: the application server is used for providing core service management and control service, configuring service plug-in and service module and providing interface.
9. The working method of the multi-source heterogeneous intelligence convergence cooperative processing platform based on the content identification as claimed in claim 1, comprising the following steps:
step S1: the automatic monitoring and warehousing module monitors a new file input by a message data source, and provides disaster-tolerant backup of data by using the backup module so as to ensure that original and finished data cannot be lost under an extreme environment;
step S2: copying or moving the new file to a corresponding working directory through a file moving module, preprocessing the new file by a preprocessing module according to a file format, and transmitting the processed data to an extracting module;
step S3: analyzing the data in the new file through a system core module, and performing warehousing operation on the extracted specific information so as to be called and displayed conveniently; and simultaneously, moving the file to a file storage directory for calling by the front end and the back end.
10. The multi-source heterogeneous intelligence convergence collaborative processing platform based on content identification according to claim 9, wherein: and step S2, the preprocessing module carries out data modeling and knowledge generation, constructs a knowledge base facing to business and forms a data processing rule.
CN202110541644.1A 2021-05-18 2021-05-18 Multi-source heterogeneous information convergence cooperative processing platform based on content identification Pending CN113360599A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110541644.1A CN113360599A (en) 2021-05-18 2021-05-18 Multi-source heterogeneous information convergence cooperative processing platform based on content identification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110541644.1A CN113360599A (en) 2021-05-18 2021-05-18 Multi-source heterogeneous information convergence cooperative processing platform based on content identification

Publications (1)

Publication Number Publication Date
CN113360599A true CN113360599A (en) 2021-09-07

Family

ID=77526879

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110541644.1A Pending CN113360599A (en) 2021-05-18 2021-05-18 Multi-source heterogeneous information convergence cooperative processing platform based on content identification

Country Status (1)

Country Link
CN (1) CN113360599A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114238731A (en) * 2021-11-23 2022-03-25 浪潮软件集团有限公司 Domestic CPU retrieval method, system, device and computer readable medium
CN114398442A (en) * 2022-01-25 2022-04-26 中国电子科技集团公司第十研究所 Data-driven information processing system
CN114418038A (en) * 2022-03-29 2022-04-29 北京道达天际科技有限公司 Space-based information classification method and device based on multi-mode fusion and electronic equipment
CN114860875A (en) * 2022-04-26 2022-08-05 深圳市生态环境智能管控中心 Data integration system and method for fixed pollution source
CN115174427A (en) * 2022-06-01 2022-10-11 中国电子科技集团公司第十研究所 Message monitoring system and method for aerospace ground equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104506632A (en) * 2014-12-25 2015-04-08 中国科学院电子学研究所 Resource sharing system and method based on distributed multi-center
CN112115198A (en) * 2020-09-14 2020-12-22 宁波市测绘和遥感技术研究院 Urban remote sensing intelligent service platform
CN112687097A (en) * 2020-11-16 2021-04-20 招商新智科技有限公司 Highway highway section level data center platform system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104506632A (en) * 2014-12-25 2015-04-08 中国科学院电子学研究所 Resource sharing system and method based on distributed multi-center
CN112115198A (en) * 2020-09-14 2020-12-22 宁波市测绘和遥感技术研究院 Urban remote sensing intelligent service platform
CN112687097A (en) * 2020-11-16 2021-04-20 招商新智科技有限公司 Highway highway section level data center platform system

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114238731A (en) * 2021-11-23 2022-03-25 浪潮软件集团有限公司 Domestic CPU retrieval method, system, device and computer readable medium
CN114398442A (en) * 2022-01-25 2022-04-26 中国电子科技集团公司第十研究所 Data-driven information processing system
CN114398442B (en) * 2022-01-25 2023-09-19 中国电子科技集团公司第十研究所 Information processing system based on data driving
CN114418038A (en) * 2022-03-29 2022-04-29 北京道达天际科技有限公司 Space-based information classification method and device based on multi-mode fusion and electronic equipment
CN114860875A (en) * 2022-04-26 2022-08-05 深圳市生态环境智能管控中心 Data integration system and method for fixed pollution source
CN114860875B (en) * 2022-04-26 2023-06-20 深圳市生态环境智能管控中心 Data integration system and method for fixed pollution source
CN115174427A (en) * 2022-06-01 2022-10-11 中国电子科技集团公司第十研究所 Message monitoring system and method for aerospace ground equipment

Similar Documents

Publication Publication Date Title
CN113360599A (en) Multi-source heterogeneous information convergence cooperative processing platform based on content identification
CN109992645B (en) Data management system and method based on text data
US7373612B2 (en) Multidimensional structured data visualization method and apparatus, text visualization method and apparatus, method and apparatus for visualizing and graphically navigating the world wide web, method and apparatus for visualizing hierarchies
US7707210B2 (en) System and method for multi-dimensional foraging and retrieval of documents
CN110866123B (en) Method for constructing data map based on data model and system for constructing data map
CN113190687B (en) Knowledge graph determining method and device, computer equipment and storage medium
CN106202514A (en) Accident based on Agent is across the search method of media information and system
CN109710767B (en) Multilingual big data service platform
CN114880588B (en) News heat prediction method based on knowledge graph
CN106649498A (en) Network public opinion analysis system based on crawler and text clustering analysis
CN118093599B (en) Knowledge graph construction method and device and computer readable storage medium
CN112597370A (en) Webpage information autonomous collecting and screening system with specified demand range
CN114356967A (en) A professional intelligence collection and analysis application platform
CN120144845A (en) Science and Technology News Intelligence Perception System Based on Large Language Model
Alam et al. Intellibvr-intelligent large-scale video retrieval for objects and events utilizing distributed deep-learning and semantic approaches
CN117171105A (en) An electronic archive management system based on knowledge graph
CN111859108A (en) Public opinion system search word recommendation system
KR20220095654A (en) Social data collection and analysis system
CN118278511A (en) Information extraction system for multi-source heterogeneous news data
CN116595043A (en) Big data retrieval method and device
Worring et al. Insight in image collections by multimedia pivot tables
Kaufhold et al. Cross-Media Usage of Social Big Data for Emergency Services and Volunteer Communities: Approaches, Development and Challenges of Multi-Platform Social Media Services
Rehman et al. Building multi-resolution event-enriched maps from social data
Wang et al. News Insider: Innovating News Understanding to Improve the Quality of Reading Experience
Nassis et al. A requirement engineering approach for designing XML-view driven, XML document warehouses

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20211029

Address after: 830000 Qingnian Road, Tianshan District, Urumqi City, Xinjiang Uygur Autonomous Region

Applicant after: Fu Zhizhi

Address before: Room 501, North building, Huihu building, No. 10, Yueliangwan Road, Suzhou Industrial Park, Suzhou, Jiangsu 215000

Applicant before: Suzhou Haisai Artificial Intelligence Co.,Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210907