[go: up one dir, main page]

CN108549659B - A data warehouse management system and management method - Google Patents

A data warehouse management system and management method Download PDF

Info

Publication number
CN108549659B
CN108549659B CN201810201836.6A CN201810201836A CN108549659B CN 108549659 B CN108549659 B CN 108549659B CN 201810201836 A CN201810201836 A CN 201810201836A CN 108549659 B CN108549659 B CN 108549659B
Authority
CN
China
Prior art keywords
data
query
file
warehousing
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201810201836.6A
Other languages
Chinese (zh)
Other versions
CN108549659A (en
Inventor
郁建林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongcheng Taixin Suzhou Technology Development Co ltd
Original Assignee
Zhongcheng Taixin Suzhou Technology Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongcheng Taixin Suzhou Technology Development Co ltd filed Critical Zhongcheng Taixin Suzhou Technology Development Co ltd
Priority to CN201810201836.6A priority Critical patent/CN108549659B/en
Publication of CN108549659A publication Critical patent/CN108549659A/en
Application granted granted Critical
Publication of CN108549659B publication Critical patent/CN108549659B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/604Tools and structures for managing or administering access control systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6227Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Automation & Control Theory (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

本发明涉及数据仓库技术领域,尤其涉及一种数据仓库管理系统,包括数据入库模块、数据存储模块、数据浏览模块、数据查询定位模块以及数据下载模块,所述数据入库模块包括自动扫描入库单元,所述自动扫描入库单元,用于通过自动扫描方式将数据写入数据仓库;自动扫描入库方式包括:按照原始树形文件夹方式扫描入库、无差别数据扫描入库、文件筛选入库、特定位置数据入库。本发明通过软硬件结合进行文件管理,改变了传统的人工管理数据方式,有效提高了数据仓库的管理效率。

Figure 201810201836

The invention relates to the technical field of data warehouses, and in particular to a data warehouse management system, comprising a data storage module, a data storage module, a data browsing module, a data query and positioning module and a data download module. Warehouse unit, the automatic scanning warehousing unit is used to write data into the data warehouse through automatic scanning; the automatic scanning warehousing method includes: scanning warehousing according to the original tree folder method, undifferentiated data scanning warehousing, file scanning and warehousing Filter storage, specific location data storage. The present invention performs file management through the combination of software and hardware, changes the traditional manual data management method, and effectively improves the management efficiency of the data warehouse.

Figure 201810201836

Description

Data warehouse management system and management method
Technical Field
The invention relates to the technical field of data warehouses, in particular to a data warehouse management system and a data warehouse management method.
Background
A data warehouse (dataware house, which may be abbreviated as DW or DWH) is a structured data environment. The data warehouse can provide data support for data analysis, data reporting, data mining and other applications.
Data warehouse management is a core content of data warehouse operation, and the existing data uploading and maintaining methods of data warehouses are generally as follows: the data warehouse manager uploads data regularly/irregularly by a manual mode, analyzes metadata of the data warehouse, then arranges a failed data table list according to an analysis result, provides the failed data table list to a corresponding technical responsible person, and the technical responsible person respectively performs failure confirmation on each table in the failed data table list and performs corresponding processing on each table after the failure confirmation, such as deleting a corresponding table. Namely, the existing data uploading and maintaining method of the data warehouse still stays in the manual management stage, the management workload of the data warehouse is large, and the management efficiency is low. With the increasing of large data capacity and types of various industries, the conventional data management system cannot meet the application requirements for storing and managing such data.
Therefore, it is desirable to provide a new data warehouse management system and method.
Disclosure of Invention
In view of the above problems in the prior art, an object of the present invention is to provide a data warehouse management system and a management method, so as to implement automatic warehouse entry and management of data.
In a first aspect, the present invention provides a data warehouse management system, which comprises a data warehousing module, a data storage module, a data browsing module, a data query positioning module, and a data downloading module, wherein,
the data storage module is used for writing data into a data warehouse; the data includes raster image data, vector data, and result-type document data. The data storage module is used for carrying out storage cataloguing operation on raster image data (remote sensing satellite image data, intermediate processing data, general format data GeoTiff and the like), vector data, achievement documents and other data, and the data is convenient to browse, inquire and apply.
The data warehousing cataloging mainly refers to that a user catalogs data to be warehoused according to business needs in a certain mode, and the cataloging process is a tree building process. The data may be cataloged by year, region, category, etc.
The data warehousing module comprises an automatic scanning warehousing unit, and the automatic scanning warehousing unit is used for writing data into a data warehouse in an automatic scanning mode; the automatic scanning and warehousing mode comprises the following steps: scanning and warehousing according to an original tree folder mode, scanning and warehousing undifferentiated data, screening and warehousing files, and warehousing data at specific positions;
the data storage module is used for storing the data which are put in storage;
the data browsing module is used for browsing the data in the data warehouse;
the data query positioning module is used for performing query operation on data in the data warehouse and positioning the storage position of the data;
and the data downloading module is used for selectively downloading the inquired list data according to the requirement based on the operation result of the data inquiry positioning module.
Preferably, the system further comprises a data storage medium module and a communication controller module, wherein the communication controller module is used for scheduling and controlling the data storage medium module.
Preferably, the data storage medium module is a hard disk storage cube, the hard disk storage cube is formed by stacking a plurality of hard disk cabinets, and the communication controller module is used for scheduling and controlling each hard disk cabinet.
Preferably, the data storage module further comprises a manual storage unit, a data processing unit and a data storage medium monitoring unit;
the manual storage unit is used for writing data into a data warehouse in a manual input mode;
the data processing unit is used for performing warehousing processing on the warehoused data;
the data storage medium module monitoring unit is used for monitoring the state of the data storage medium module in real time.
Preferably, the warehousing process includes: labeling, screening a rule set, file context perception association and file information retrieval.
Preferably, the scanning and warehousing according to the original tree-shaped folder mode means that the files are scanned and warehoused according to the tree structure of the folders; the step of scanning and warehousing the undifferentiated data refers to that the structure of a folder is not reserved, and all files are placed in a list to be scanned and warehoused; the file screening and warehousing refers to warehousing of specific files; the specific location data warehousing means warehousing data located in a specific storage location.
Preferably, the data storage mode of the data storage module includes hard disk cabinet data storage and offline hard disk storage, the data browsing mode includes enlargement, reduction, full map display, full map enlargement, full map reduction, roaming, pointer, and map refreshing, and the data query operation of the data query positioning module includes yard/container query, tag query, query in container, general file query, offline/online data query, and same file query.
Preferably, the downloading mode in the data downloading module includes: the method comprises the steps of single-selection file downloading, multi-selection file downloading, data packet file downloading, multi-file queuing downloading, offline data delayed downloading and downloading breakpoint continuous transmission.
Preferably, the yard/container query refers to querying data belonging to a certain rule set; the label query refers to querying data with a specific label; the in-container query refers to a specific query mode set for each rule set for query; the file general query refers to querying according to the attribute of the file; the offline/online data query refers to supporting indifferent query of offline data and online data; the same file query refers to the removal of duplicate files through the md5 code of the query file.
Preferably, the method further comprises the following steps: and the user authority management module is used for distributing and managing the user authority.
In a second aspect, the present invention provides a data warehouse management method, including the following steps:
s1, writing data into a data warehouse through an automatic scanning data storage medium; the data warehouse supports warehousing of general files, and specifically includes but is not limited to warehousing cataloguing operation of raster image data (remote sensing satellite image data, intermediate processing data, general format data GeoTiff and the like), vector data, achievement documents and other data, so that browsing and query application of the data is facilitated.
The data warehousing cataloging mainly refers to that a user catalogs data to be warehoused according to business needs in a certain mode, and the cataloging process is a tree building process; the data may be cataloged by year, region, category, etc.
The automatic scanning and warehousing mode comprises the following steps: scanning and warehousing according to an original tree folder mode, scanning and warehousing undifferentiated data, screening and warehousing files, and warehousing data at specific positions;
s2, storing the data in storage;
s3, browsing the data in the data warehouse;
s4, inquiring data in the data warehouse, and positioning the storage position of the data;
and S5, selectively downloading the inquired list data according to the requirement based on the operation result of the data inquiry positioning module.
Preferably, the step S1 further includes: and performing warehousing processing on the warehoused data, and monitoring the state of the data storage medium in real time.
Preferably, the warehousing process includes: labeling, screening a rule set, file context perception association and file information retrieval.
Preferably, the scanning and warehousing according to the original tree-shaped folder mode means that the files are scanned and warehoused according to the tree structure of the folders; the step of scanning and warehousing the undifferentiated data refers to that the structure of a folder is not reserved, and all files are placed in a list to be scanned and warehoused; the file screening and warehousing refers to warehousing of specific files; the specific location data warehousing means warehousing data located in a specific storage location.
Preferably, the data storage manner in step S2 includes hard disk cabinet data storage and offline hard disk storage, the data browsing manner in step S3 includes zooming in, zooming out, full-map display, full-map zooming in, full-map zooming out, roaming, pointer, and map refreshing, and the data query operation in step S4 includes yard/container query, tag query, query in container, general file query, offline/online data query, and same file query.
Preferably, the data downloading method in step S5 includes: the method comprises the steps of single-selection file downloading, multi-selection file downloading, data packet file downloading, multi-file queuing downloading, offline data delayed downloading and downloading breakpoint continuous transmission.
Preferably, the yard/container query refers to querying data belonging to a certain rule set; the label query refers to querying data with a specific label; the in-container query refers to a specific query mode set for each rule set for query; the file general query refers to querying according to the attribute of the file; the offline/online data query refers to supporting indifferent query of offline data and online data; the same file query refers to the removal of duplicate files through the md5 code of the query file.
Preferably, the step S1 is preceded by:
s0. assign and manage user rights.
The data warehouse management system disclosed by the invention is wide in application, and is suitable for large-scale data center support data management application and personal information management and application according to the size of scale.
The large data center management mainly refers to the management of national big data, and the personal information management mainly refers to the management of personal computer files.
The data warehouse management system of the invention has the following characteristics:
(1) the software and the hardware are combined to carry out file management, so that even past data can be easily put in a warehouse;
(2) the data warehouse does not make an upper limit requirement on the capacity of data storage;
(3) the data warehouse supports the unified management of online and offline data;
(4) a data warehouse user dynamically establishes a data catalog according to actual services to form a data tree;
(5) the data browsing modes have diversity and can carry out operations such as zooming in, zooming out, full-image display, full-image zooming in, full-image zooming out, roaming, pointer, map refreshing and the like;
(6) aiming at different users with different use authorities, an administrator can distribute authorities such as inquiry area authorities, file downloading authorities and the like according to service characteristics;
(7) the data warehouse creation and the tree structure manufacturing are simple and convenient, and the application efficiency is high;
(8) the data warehouse supports the warehousing of general files, supports more satellite data sources, and supports all general format data including format data such as GeoTiff, HDF, H5 and the like.
The invention has the following beneficial effects:
the invention realizes the business functions of warehousing, browsing, inquiring, positioning, downloading and the like of the general files, and simultaneously, a user administrator can distribute the use permission of other users according to the business requirements. According to the invention, file management is carried out by combining software and hardware, the traditional manual data management mode is changed, and the management efficiency of the data warehouse is effectively improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions and advantages of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a schematic block diagram of a data warehouse management system according to the present invention;
FIG. 2 is a flow chart of a data warehouse management method of the present invention;
fig. 3 is a data warehousing flow diagram of the data warehouse management system of the present invention.
FIG. 4 is a flow chart of warehousing a file using the data warehouse management system of the present invention;
FIG. 5 is a flow chart of opening or downloading a file using the data warehouse management system of the present invention;
FIG. 6 is a schematic structural diagram of a single hard disk enclosure module;
FIG. 7 is a block diagram of a hard disk memory cube and a communication controller.
In the figures, the reference numerals correspond to: 1-hard disk dock, 2-hard disk controller, 3-hard disk cabinet, 4-communication controller module and 5-hard disk storage cube.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention.
It is noted that the terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The data warehouse of the present invention has a plurality of hard disk cabinets, each having a plurality of hard disk bays, which are referred to herein as data docks (which may be analogous to the concept of docks in wharfs).
The terms referred to in the present invention are explained below:
data container (container for short): when all data come, the data are preliminarily classified, and each container has a label: such as lake Taihu blue algae, Jiuli riparian zone, Yangcheng lake waterweeds, KEY (KEY needs to be managed and has different lengths) used as a container in the monitoring fields of bridge reservoirs and the like. Data that has some inherent logic is called a data container, and a folder can be considered a data container.
The data container has the following characteristics:
1) the data container must have a name.
2) Data containers cannot span physical hard disks.
3) The data container cannot cross logical hard disks.
4) Packages, folders, and files may appear below the data container.
5) The same files, packages, and folders may be located in different data containers.
6) One data container may be located in multiple yards.
7) The data container cannot contain a data container.
Data yard (stock yard for short): when a plurality of containers are put together, the management of the data containers is realized, and the management can be divided into: land utilization, ecological resources, marine resources, and the like. The data store represents concepts such as "topic", "domain", and the like. The data yard has the following characteristics:
1) the data storage yard must have a name.
2) The data storage yard may span logical hard disks but not physical hard disks.
3) Only data containers can be present below the data yard.
4) Multiple data storage yards may contain the same container underneath.
5) The data yard, data containers and packages are maintained by a database.
6) The data yard, data container, and package may each have a plurality of key-value pairs associated therewith.
Data packet (packet for short): the package is a physical concept and represents the concept of a combined file, such as a compressed package after being split, an shp file and other combined files. The bag has the following characteristics:
1) the packet must have a name.
2) The data package may contain folders and files below.
3) A file or folder can only be located in one package at most.
4) A packet may be located in different containers.
5) The data packet cannot be added directly to the yard.
6) The data package does not correspond to the actual file on the hard disk, and the information of the data package is maintained by the data warehouse.
7) The data packets can only be contained in the data container.
8) A packet may not contain a packet.
9) Because the data container cannot cross a logical hard disk, the package cannot cross a logical hard disk either.
A data dock: and a plurality of storage yards are placed together, so that the centralized management of the plurality of storage yards is realized.
Label (TAB): the data tag label is imported, so that the query can be performed more quickly;
keyword (KEY): such as location, field, year, date, satellite, sensor, etc.
A rule set: for all files that the user has selected, the rule set determines the following:
1) and warehousing which files in the selected files.
2) The rule set carries some preset storage yards.
3) The rule set adds some special key-value pairs to the files belonging to the rule set.
4) The rule set extracts special information in the file.
The rule set has the following characteristics: the rule set supports selection and cancellation, a plurality of rule sets support combination, and the rule set supports reverse selection, namely, data which does not belong to the rule set is put in storage.
The rule sets include a common file system rule set, a spatial data rule set, an office document rule set, and the like.
Example 1
As shown in fig. 1, the present invention discloses a data warehouse management system, which comprises a data warehousing module, a data storage module, a data browsing module, a data query positioning module and a data downloading module, wherein,
the data storage module comprises a manual storage unit, an automatic scanning storage unit, a data processing unit and a disk monitoring unit, and is used for writing data into a data warehouse in a manual input or automatic scanning mode. The data includes, but is not limited to, raster image data, vector data, and production-class document data. The data storage module is used for carrying out storage cataloguing operation on raster image data (remote sensing satellite image data, intermediate processing data, general format data GeoTiff and the like), vector data, achievement documents and other data, and the data is convenient to browse, inquire and apply.
The data warehousing cataloging mainly refers to that a user catalogs data to be warehoused according to business needs in a certain mode, and the cataloging process is a tree building process. The data may be cataloged by year, region, category, etc.
The manual storage unit is used for writing data into a data warehouse in a manual input mode;
the automatic scanning and warehousing unit is used for writing data into a data warehouse in an automatic scanning mode, such as a hard disk, a flash memory, a U disk, a CF card, an SD card and the like; the automatic scanning mode comprises the following steps:
(1) scanning and warehousing according to an original tree-shaped folder mode, namely scanning and warehousing files according to a tree structure of a folder;
(2) scanning and warehousing undifferentiated data, namely, not reserving a folder structure, and enabling all files to be in a list;
(3) and screening and warehousing the files, wherein the files can be set to be warehoused: if only data in docx and xlsx format is put in storage, only data with certain symbols in the file name is put in storage, and only data with the modification date within a certain range is put in storage; only data with a file size within a certain range is put in storage, and the like;
(4) specific position data are put into a warehouse; if only the data of the C disk is put in storage, only the data of a plurality of folders is put in storage, and the like;
the data processing unit is used for performing warehousing processing on the warehoused data; the warehousing treatment comprises the following steps:
(1) labeling; if the data in a certain folder is selected to be put in storage, the data in the folder can be selected to be marked with a 'performance evaluation number' label and a 'tin-free city' label, and the label can be used in a plurality of places such as data searching and the like, and the data can be searched and queried through the label;
(2) screening a rule set; for example, a rule set of remote sensing data, the rule set only stores scanned remote sensing data, such as hdf files, GeoTif files, shp files and the like, wherein the GeoTif and hdf files can be automatically marked with a "raster data label", compressed packet data with a specific name is stored, and if a compressed packet with the name of "GF 1_ WFV1_ E119.6_ N31.3_20150520_ L1a 0000817197" is searched, a label of "GF 1_ WFV 1", "original raster data" and the like is automatically marked; for example, the office rule set only scans and stores files with formats such as docx, xlsx, doc, vsdx and the like; multiple rule sets can be overlapped;
(3) context-aware association of files; the following table 1 provides a data source list of a data warehouse, and if a file in the TIFF format of the high-grade one-number satellite data is retrieved, whether an xml metadata file matched with the file exists or not is automatically searched; for example, if a file in the ". shp" format is retrieved, files such as dbf, prj, sbn, sbx and shx associated with the file can be automatically searched; these related files are then put into a data package. Different rule sets have different context-aware rules;
TABLE 1
Satellite shorthand Description of the invention
GF1 High-resolution first-satellite 2 m/8 m/16 m
GF2 High-resolution second satellite 0.8 m/3.2 m
ZY3 Resource third satellite 2.1 m/5.8 m
ZY02C Resource No. 02C satellite 2.36 m/5 m/10 m
HJ1 Environment I satellite 30 m/100 m/150 m/300 m
TH Sky-painted first satellite 5-10 m
RE RapidEye satellite 5 m
SPOT6 SPOT6 satellite 1.5 m/6 m
MODIS MDOIS afternoon star, afternoon star 250 m/500 m/1000 m
NPP NPP satellite 375 m/375 m
FY-3 Fengyun three-number satellite 250 m/1000 m
LANDSAT Landsat satellite data 15 m/30 m/100 m
BJ2 Beijing second satellite 0.8 m/3.2 m
(4) Retrieving file information; different rule sets retrieve different information of different files; for example, the rule set of picture data may retrieve information related to the image data, such as the width, height, resolution, etc. of the picture file. The remote sensing data rule set can retrieve longitude and latitude information, wave band number, projection information, resolution ratio and other information of the remote sensing data. Some text files can directly read out text data and store the text data in a database, so that the subsequent query is facilitated. By retrieving the data information in the storage or in the background, the speed of related query can be increased.
And the data storage medium monitoring unit is used for monitoring the state of the data storage medium in real time. Specifically, after the disk is inserted, all files are automatically retrieved and data items in the database are updated, meanwhile, the data are set to be in an online state, and after the disk is pulled out, the data state is automatically changed to be in an offline state. The disk may be in multiple states, such as: online state, scanning warehousing state, query state, offline state, unavailable state, forbidden state, etc.;
the data storage module is used for storing the data in storage; the data storage mode of the data storage module comprises hard disk cabinet data storage and offline hard disk storage.
The data browsing module is used for browsing the data in the data warehouse; the data browsing modes include but are not limited to zooming in, zooming out, full-image display, full-image zooming in, full-image zooming out, roaming, pointer and map refreshing;
the data query positioning module is used for performing query operation on data in the data warehouse and positioning the storage position of the data; specifically, data query operation is performed on raster image data, vector data, result documents and other data which are put into a database and belong to a tree-type directory structure according to data types, imaging time, satellites and sensors, and double-click is performed on the queried data so as to position a file where the data is located.
The data query operation of the data query positioning module includes but is not limited to storage yard/container query, label query, container query, file general query, off-line/on-line data query, and same file query.
1) Yard/container query: that is, data belonging to a certain rule set is queried, for example, all data belonging to a remote sensing container are queried, and the query result may include an shp vector file, GF1 satellite data and the like; then, if the data of the Office container is inquired, the inquired result can be files in formats of docx, doc, xlsx, vsdx and the like;
2) and (3) label query: that is, the query has some data tags, such as query "performance assessment" tag, which lists all data with this tag, and further such as query "GF 1" and "raster data" tag, which displays raster data of all GF1 satellites;
3) querying in the container: each rule set has a unique query mode, such as 'spatial query', 'vector file query', 'projection type query' and the like all belong to a 'remote sensing data rule set', and all data with a projection type of UTM (unified transform) can be queried, such as all data in the Zhejiang range; for example, word queries and the like belong to the office rule set. The program supports a plurality of query modes in the rule set;
4) file general query: all files have some same attributes, such as file name, modification date, extension, creation date, size, etc.; the program supports the query of the types, such as the query of all files with the extension name of shp, the query of md5 codes and the like;
5) offline/online data query: supporting indifference query of offline data and online data;
6) and querying the same file: when the program is put in storage, the md5 codes of the files are calculated, and if the md5 codes of the two files are the same, the contents of the two files are completely the same, and the function can be used for clearing redundant data files.
The data downloading module is used for selectively downloading the inquired list data according to the requirement based on the operation result of the data inquiry positioning module; the downloading mode in the data downloading module comprises but is not limited to single-selection file downloading, multi-selection file downloading, data file packaging downloading, multi-file queuing downloading, offline data delayed downloading and downloading breakpoint continuous transmission.
The data warehouse management system also comprises a data storage medium module and a communication controller module, wherein the data storage medium module is electrically connected with the communication controller module; the data storage medium module is a hard disk storage cube, the hard disk storage cube is formed by stacking a plurality of hard disk cabinets, and the communication controller module is used for scheduling and controlling each hard disk cabinet.
The invention adopts a two-layer cascade mode to organize the hard disk. As shown in fig. 6, the first stage is a hard disk cabinet, one hard disk cabinet 3 is formed by stacking a plurality of hard disk docks 1, and a hard disk controller 2 (DCU for short) is used to control the hard disk cabinet 3, so that each hard disk dock 1 can be switched on and off under the control of DCU commands, and the state of the hard disk dock 1 can be read. Wherein the hard disk controller 2 is composed of a single chip microcomputer.
As shown in fig. 7, in the second stage, a plurality of hard disk cabinets 3 are stacked to form a hard disk storage cube 5, and a communication controller module 4 (CCU for short) performs scheduling control on each hard disk cabinet, so as to integrally control each hard disk dock 1 in the hard disk storage cube 5, control the hard disk dock 1 to be turned on and off, and obtain the current state of the hard disk dock.
Example 2
The invention discloses a data warehouse management system, which comprises a data warehousing module, a data storage module, a data browsing module, a data query positioning module, a data downloading module and a user authority management module, wherein,
the data storage module comprises a manual storage unit, an automatic scanning storage unit, a data processing unit and a disk monitoring unit, and is used for writing data into a data warehouse in a manual input or automatic scanning mode. The data includes, but is not limited to, raster image data, vector data, and production-class document data.
The manual storage unit is used for writing data into a data warehouse in a manual input mode;
the automatic scanning and warehousing unit is used for writing data into a data warehouse in an automatic scanning mode, such as a hard disk, a flash memory, a U disk, a CF card, an SD card and the like; the automatic scanning mode comprises the following steps:
(1) scanning and warehousing according to an original tree-shaped folder mode, namely scanning and warehousing files according to a tree structure of a folder;
(2) scanning and warehousing undifferentiated data, namely, not reserving a folder structure, and enabling all files to be in a list;
(3) and screening and warehousing the files, wherein the files can be set to be warehoused: if only data in docx and xlsx format is put in storage, only data with certain symbols in the file name is put in storage, and only data with the modification date within a certain range is put in storage; only data with a file size within a certain range is put in storage, and the like;
(4) specific position data are put into a warehouse; if only the data of the C disk is put in storage, only the data of a plurality of folders is put in storage, and the like;
the data processing unit is used for performing warehousing processing on the warehoused data; the warehousing treatment comprises the following steps:
(1) labeling; if the data in a certain folder is selected to be put in storage, the data in the folder can be selected to be marked with a 'performance evaluation number' label and a 'tin-free city' label, and the label can be used in a plurality of places such as data searching and the like, and the data can be searched and queried through the label;
(2) screening a rule set; for example, a rule set of remote sensing data, the rule set only stores scanned remote sensing data, such as hdf files, GeoTif files, shp files and the like, wherein the GeoTif and hdf files can be automatically marked with a "raster data label", compressed packet data with a specific name is stored, and if a compressed packet with the name of "GF 1_ WFV1_ E119.6_ N31.3_20150520_ L1a 0000817197" is searched, a label of "GF 1_ WFV 1", "original raster data" and the like is automatically marked; for example, the office rule set only scans and stores files with formats such as docx, xlsx, doc, vsdx and the like; multiple rule sets can be overlapped;
(3) context-aware association of files; if a file in the TIFF format of the high-score first-number satellite data is searched, automatically searching whether an xml metadata file matched with the file exists; for example, if a file in the ". shp" format is retrieved, files such as dbf, prj, sbn, sbx and shx associated with the file can be automatically searched; these related files are then put into a data package. Different rule sets have different context-aware rules;
(4) retrieving file information; different rule sets retrieve different information of different files; for example, the rule set of picture data may retrieve information related to the image data, such as the width, height, resolution, etc. of the picture file. The remote sensing data rule set can retrieve longitude and latitude information, wave band number, projection information, resolution ratio and other information of the remote sensing data. Some text files can directly read out text data and store the text data in a database, so that the subsequent query is facilitated. By retrieving the data information in the storage or in the background, the speed of related query can be increased.
And the data storage medium monitoring unit is used for monitoring the state of the data storage medium in real time. Specifically, after the disk is inserted, all files are automatically retrieved and data items in the database are updated, meanwhile, the data are set to be in an online state, and after the disk is pulled out, the data state is automatically changed to be in an offline state. The disk may be in multiple states, such as: online state, scanning warehousing state, query state, offline state, unavailable state, forbidden state, etc.;
the data storage module is used for storing the data in storage; the data storage mode of the data storage module comprises hard disk cabinet data storage and offline hard disk storage.
The data browsing module is used for browsing the data in the data warehouse; the data browsing modes include but are not limited to zooming in, zooming out, full-image display, full-image zooming in, full-image zooming out, roaming, pointer and map refreshing;
the data query positioning module is used for performing query operation on data in the data warehouse and positioning the storage position of the data; specifically, data query operation is performed on raster image data, vector data, result documents and other data which are put into a database and belong to a tree-type directory structure according to data types, imaging time, satellites and sensors, and double-click is performed on the queried data so as to position a file where the data is located.
The data query operation of the data query positioning module includes but is not limited to storage yard/container query, label query, container query, file general query, off-line/on-line data query, and same file query.
1) Yard/container query: that is, data belonging to a certain rule set is queried, for example, all data belonging to a remote sensing container are queried, and the query result may include an shp vector file, GF1 satellite data and the like; then, if the data of the Office container is inquired, the inquired result can be files in formats of docx, doc, xlsx, vsdx and the like;
2) and (3) label query: that is, the query has some data tags, such as query "performance assessment" tag, which lists all data with this tag, and further such as query "GF 1" and "raster data" tag, which displays raster data of all GF1 satellites;
3) querying in the container: each rule set has a unique query mode, such as 'spatial query', 'vector file query', 'projection type query' and the like all belong to a 'remote sensing data rule set', and all data with a projection type of UTM (unified transform) can be queried, such as all data in the Zhejiang range; for example, word queries and the like belong to the office rule set. The program supports a plurality of query modes in the rule set;
4) file general query: all files have some same attributes, such as file name, modification date, extension, creation date, size, etc.; the program supports the query of the types, such as the query of all files with the extension name of shp, the query of md5 codes and the like;
5) offline/online data query: supporting indifference query of offline data and online data;
6) and querying the same file: when the program is put in storage, the md5 codes of the files are calculated, and if the md5 codes of the two files are the same, the contents of the two files are completely the same, and the function can be used for clearing redundant data files.
The data downloading module is used for selectively downloading the inquired list data according to the requirement based on the operation result of the data inquiry positioning module;
the downloading mode in the data downloading module comprises the following steps:
1) downloading the radio files;
2) downloading a multi-choice file;
3) downloading a data package file, namely packaging and downloading the data file;
4) supporting multi-file queuing downloading, adjusting the downloading sequence, and controlling downloading (re-downloading, suspending downloading, stopping downloading, etc.);
5) offline data delayed downloading, namely if the file is in an offline state, the downloading is marked as a 'planning task', and the file can be automatically downloaded in a background mode when the data is online and the client service is operated (namely in a non-shutdown state) through background service;
6) downloading breakpoint resuming: for example, if the disk is pulled out during the transmission of the data, the downloading is temporarily stopped, after the disk is reconnected, the program verifies the consistency of the file through the content such as md5, and the like, if the file is not modified, the downloading is continued from the breakpoint, if the file is modified, the downloading can be selected to be re-downloaded or abandoned, and if the file is deleted, the direct downloading fails and an error message is returned.
The user right management module is used for allocating and managing user rights, and specifically comprises:
1) the authority inherits: when the authority is set, the subfile can be chosen to inherit the authority;
2) setting batch permissions: namely, the same authority is simultaneously applied to a plurality of selected files;
3) the permission role is built in: such as Guest, administeror, User, etc., a User may belong to one or more rights roles. If the user belongs to a certain authority role, the user has all the authorities of the role;
4) built-in super administrator user: a super Administrator user belongs to an administeror role, and the role has all permissions, including the permission for configuring the permissions of other users;
5) and (5) login account password management.
The data warehouse management system also comprises a data storage medium module and a communication controller module, wherein the data storage medium module is electrically connected with the communication controller module; the data storage medium module is a hard disk storage cube, the hard disk storage cube is formed by stacking a plurality of hard disk cabinets, and the communication controller module is used for scheduling and controlling each hard disk cabinet.
The data warehouse management system of the invention has the following characteristics:
(1) the software and the hardware are combined to carry out file management, so that even past data can be easily put in a warehouse;
(2) the data warehouse does not make an upper limit requirement on the capacity of data storage;
(3) the data warehouse supports the unified management of online and offline data;
(4) a data warehouse user dynamically establishes a data catalog according to actual services to form a data tree;
(5) the data browsing modes have diversity and can carry out operations such as zooming in, zooming out, full-image display, full-image zooming in, full-image zooming out, roaming, pointer, map refreshing and the like;
(6) aiming at different users with different use authorities, an administrator can distribute authorities such as inquiry area authorities, file downloading authorities and the like according to service characteristics;
(7) the data warehouse creation and the tree structure manufacturing are simple and convenient, and the application efficiency is high;
(8) the data warehouse supports the warehousing of general files, supports more satellite data sources, and supports all general format data including format data such as GeoTiff, HDF, H5 and the like.
Example 3
As shown in fig. 2, the present invention further provides a data warehouse management method, which includes the following steps:
s1, writing data into a data warehouse by adopting a manual input or automatic scanning mode; the data includes but is not limited to raster image data, vector data and result type document data; as shown in fig. 3, the data storage specifically includes: performing warehousing and cataloguing operation on raster image data (remote sensing satellite image data, intermediate processing data, general format data GeoTiff and the like), vector data, result documents and other data, so as to conveniently realize browsing and query application of the data;
the data warehousing cataloging mainly refers to that a user catalogs data to be warehoused according to business needs in a certain mode, and the cataloging process is a tree building process. The data may be cataloged by year, region, category, etc.
The automatic scanning mode comprises the following steps:
(1) scanning and warehousing according to an original tree-shaped folder mode, namely scanning and warehousing files according to a tree structure of a folder;
(2) scanning and warehousing undifferentiated data, namely, not reserving a folder structure, and enabling all files to be in a list;
(3) and screening and warehousing the files, wherein the files can be set to be warehoused: if only data in docx and xlsx format is put in storage, only data with certain symbols in the file name is put in storage, and only data with the modification date within a certain range is put in storage; only data with a file size within a certain range is put in storage, and the like;
(4) specific position data are put into a warehouse; if only the data of the C disk is put in storage, only the data of a plurality of folders is put in storage, and the like;
then, performing warehousing processing on the warehoused data, and monitoring the state of the data storage medium in real time;
the warehousing treatment comprises the following steps:
(1) labeling; if the data in a certain folder is selected to be put in storage, the data in the folder can be selected to be marked with a 'performance evaluation number' label and a 'tin-free city' label, and the label can be used in a plurality of places such as data searching and the like, and the data can be searched and queried through the label;
(2) screening a rule set; for example, a rule set of remote sensing data, the rule set only stores scanned remote sensing data, such as hdf files, GeoTif files, shp files and the like, wherein the GeoTif and hdf files can be automatically marked with a "raster data label", compressed packet data with a specific name is stored, and if a compressed packet with the name of "GF 1_ WFV1_ E119.6_ N31.3_20150520_ L1a 0000817197" is searched, a label of "GF 1_ WFV 1", "original raster data" and the like is automatically marked; for example, the office rule set only scans and stores files with formats such as docx, xlsx, doc, vsdx and the like; multiple rule sets can be overlapped;
(3) context-aware association of files; if a file in the TIFF format of the high-grade first-number satellite data is searched, automatically searching whether an xml metadata file matched with the file exists or not; for example, if a file in the ". shp" format is retrieved, files such as dbf, prj, sbn, sbx and shx associated with the file can be automatically searched; these related files are then put into a data package. Different rule sets have different context-aware rules;
(4) retrieving file information; different rule sets retrieve different information of different files; for example, the rule set of picture data may retrieve information related to the image data, such as the width, height, resolution, etc. of the picture file. The remote sensing data rule set can retrieve longitude and latitude information, wave band number, projection information, resolution ratio and other information of the remote sensing data. Some text files can directly read out text data and store the text data in a database, so that the subsequent query is facilitated. By retrieving the data information in the storage or in the background, the speed of related query can be increased.
After the disk is inserted, all files are automatically retrieved and data items in the database are updated, meanwhile, the data are set to be in an online state, and after the disk is pulled out, the data state is automatically changed to be in an offline state. The disk may be in multiple states, such as: online state, scanning warehousing state, query state, offline state, unavailable state, forbidden state, etc.;
the invention can also write the data into the data warehouse in a manual input mode;
s2, storing the data in storage; the data storage mode comprises hard disk cabinet data storage and offline hard disk storage;
s3, browsing the data stored in the data warehouse; the data browsing modes comprise zooming in, zooming out, displaying the whole image, zooming in the whole image, zooming out the whole image, roaming, a pointer and map refreshing;
s4, inquiring data in the data warehouse, and positioning the storage position of the data; the data query operation comprises yard/container query, label query, in-container query, file general query, off-line/on-line data query and same file query;
1) yard/container query: that is, data belonging to a certain rule set is queried, for example, all data belonging to a remote sensing container are queried, and the query result may include an shp vector file, GF1 satellite data and the like; then, if the data of the Office container is inquired, the inquired result can be files in formats of docx, doc, xlsx, vsdx and the like;
2) and (3) label query: that is, the query has some data tags, such as query "performance assessment" tag, which lists all data with this tag, and further such as query "GF 1" and "raster data" tag, which displays raster data of all GF1 satellites;
3) querying in the container: each rule set has a unique query mode, such as 'spatial query', 'vector file query', 'projection type query' and the like all belong to a 'remote sensing data rule set', and all data with a projection type of UTM (unified transform) can be queried, such as all data in the Zhejiang range; for example, word queries and the like belong to the office rule set. The program supports a plurality of query modes in the rule set;
4) file general query: all files have some same attributes, such as file name, modification date, extension, creation date, size, etc.; the program supports the query of the types, such as the query of all files with the extension name of shp, the query of md5 codes and the like;
5) offline/online data query: supporting indifference query of offline data and online data;
6) and querying the same file: when the program is put in storage, the md5 codes of the files are calculated, and if the md5 codes of the two files are the same, the contents of the two files are completely the same, and the function can be used for clearing redundant data files.
S5, selectively downloading the inquired list data according to the requirement based on the inquiry result; the data downloading mode comprises the following steps: the method comprises the steps of single-selection file downloading, multi-selection file downloading, data packet file downloading, multi-file queuing downloading, offline data delayed downloading and downloading breakpoint continuous transmission.
Example 4
The invention provides a data warehouse management method, which comprises the following steps:
s0. assigning and managing user rights;
s1, writing data into a data warehouse by adopting a manual input or automatic scanning mode; the data includes but is not limited to raster image data, vector data and result type document data; the method specifically comprises the following steps: performing warehousing and cataloguing operation on raster image data (remote sensing satellite image data, intermediate processing data, general format data GeoTiff and the like), vector data, result documents and other data, so as to conveniently realize browsing and query application of the data;
then, performing warehousing processing on the warehoused data, and monitoring the state of the data storage medium in real time;
the data warehousing cataloging mainly refers to that a user catalogs data to be warehoused according to business needs in a certain mode, and the cataloging process is a tree building process. The data may be cataloged by year, region, category, etc.
A file warehousing button is arranged on a use interface of the data warehouse, as shown in fig. 4, the file warehousing button is clicked firstly, a user inputs verification information, the system verifies whether the system has warehousing authority, and if the system does not have the authority, the flow is ended; and if the user has the authority, selecting one or more files by the user and selecting a copy path to copy and store in a warehouse.
The automatic scanning mode comprises the following steps:
(1) scanning and warehousing according to an original tree-shaped folder mode, namely scanning and warehousing files according to a tree structure of a folder;
(2) scanning and warehousing undifferentiated data, namely, not reserving a folder structure, and enabling all files to be in a list;
(3) and screening and warehousing the files, wherein the files can be set to be warehoused: if only data in docx and xlsx format is put in storage, only data with certain symbols in the file name is put in storage, and only data with the modification date within a certain range is put in storage; only data with a file size within a certain range is put in storage, and the like;
(4) specific position data are put into a warehouse; if only the data of the C disk is put in storage, only the data of a plurality of folders is put in storage, and the like;
then, performing warehousing processing on the warehoused data, and monitoring the state of the data storage medium in real time;
the warehousing treatment comprises the following steps:
(1) labeling; if the data in a certain folder is selected to be put in storage, the data in the folder can be selected to be marked with a 'performance evaluation number' label and a 'tin-free city' label, and the label can be used in a plurality of places such as data searching and the like, and the data can be searched and queried through the label;
(2) screening a rule set; for example, a rule set of remote sensing data, the rule set only stores scanned remote sensing data, such as hdf files, GeoTif files, shp files and the like, wherein the GeoTif and hdf files can be automatically marked with a "raster data label", compressed packet data with a specific name is stored, and if a compressed packet with the name of "GF 1_ WFV1_ E119.6_ N31.3_20150520_ L1a 0000817197" is searched, a label of "GF 1_ WFV 1", "original raster data" and the like is automatically marked; for example, the office rule set only scans and stores files with formats such as docx, xlsx, doc, vsdx and the like; multiple rule sets can be overlapped;
(3) context-aware association of files; if a file in the TIFF format of the high-grade first-number satellite data is searched, automatically searching whether an xml metadata file matched with the file exists or not; for example, if a file in the ". shp" format is retrieved, files such as dbf, prj, sbn, sbx and shx associated with the file can be automatically searched; these related files are then put into a data package. Different rule sets have different context-aware rules;
(4) retrieving file information; different rule sets retrieve different information of different files; for example, the rule set of picture data may retrieve information related to the image data, such as the width, height, resolution, etc. of the picture file. The remote sensing data rule set can retrieve longitude and latitude information, wave band number, projection information, resolution ratio and other information of the remote sensing data. Some text files can directly read out text data and store the text data in a database, so that the subsequent query is facilitated. By retrieving the data information in the storage or in the background, the speed of related query can be increased.
After the disk is inserted, all files are automatically retrieved and data items in the database are updated, meanwhile, the data are set to be in an online state, and after the disk is pulled out, the data state is automatically changed to be in an offline state. The disk may be in multiple states, such as: online state, scanning warehousing state, query state, offline state, unavailable state, forbidden state, etc.;
the invention can also write the data into the data warehouse in a manual input mode;
s2, storing the data in storage; the data storage mode comprises hard disk cabinet data storage and offline hard disk storage;
s3, browsing the data stored in the data warehouse; the data browsing modes comprise zooming in, zooming out, displaying the whole image, zooming in the whole image, zooming out the whole image, roaming, a pointer and map refreshing;
s4, inquiring data in the data warehouse, and positioning the storage position of the data; the data query operation comprises yard/container query, label query, in-container query, file general query, off-line/on-line data query and same file query;
1) yard/container query: that is, data belonging to a certain rule set is queried, for example, all data belonging to a remote sensing container are queried, and the query result may include an shp vector file, GF1 satellite data and the like; then, if the data of the Office container is inquired, the inquired result can be files in formats of docx, doc, xlsx, vsdx and the like;
2) and (3) label query: that is, the query has some data tags, such as query "performance assessment" tag, which lists all data with this tag, and further such as query "GF 1" and "raster data" tag, which displays raster data of all GF1 satellites;
3) querying in the container: each rule set has a unique query mode, such as 'spatial query', 'vector file query', 'projection type query' and the like all belong to a 'remote sensing data rule set', and all data with a projection type of UTM (unified transform) can be queried, such as all data in the Zhejiang range; for example, word queries and the like belong to the office rule set. The program supports a plurality of query modes in the rule set;
4) file general query: all files have some same attributes, such as file name, modification date, extension, creation date, size, etc.; the program supports the query of the types, such as the query of all files with the extension name of shp, the query of md5 codes and the like;
5) offline/online data query: supporting indifference query of offline data and online data;
6) and querying the same file: when the program is put in storage, the md5 codes of the files are calculated, and if the md5 codes of the two files are the same, the contents of the two files are completely the same, and the function can be used for clearing redundant data files.
S5, selectively downloading the inquired list data according to the requirement based on the inquiry result; the data downloading mode comprises the following steps: the method comprises the steps of single-selection file downloading, multi-selection file downloading, data packet file downloading, multi-file queuing downloading, offline data delayed downloading and downloading breakpoint continuous transmission.
As shown in fig. 5, when a user needs to open or download a file, first, whether the file has a right is verified, if the file does not have a right, the process is ended, if the file has a right, whether the file is online is determined, and the file that is not online is browsed and downloaded in a manner of loading a disk.
It should be noted that the embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (10)

1.一种数据仓库管理系统,其特征在于,包括数据入库模块、数据存储模块、数据浏览模块、数据查询定位模块以及数据下载模块,其中,1. a data warehouse management system, is characterized in that, comprises data warehousing module, data storage module, data browsing module, data query positioning module and data downloading module, wherein, 所述数据入库模块用于将数据写入数据仓库;The data warehousing module is used to write data into the data warehouse; 所述数据入库模块包括自动扫描入库单元、数据存储介质模块监测单元和数据处理单元,所述自动扫描入库单元,用于通过自动扫描方式将数据写入数据仓库;自动扫描入库方式包括:按照原始树形文件夹方式扫描入库、无差别数据扫描入库、文件筛选入库、特定位置数据入库,其中,所述按照原始树形文件夹方式扫描入库是指按照文件夹的树状结构将文件扫描入库;所述无差别数据扫描入库是指不保留文件夹结构,将所有文件都放在一个列表中扫描入库;所述文件筛选入库是指对特定文件进行入库;所述特定位置数据入库是指对位于特定存储位置的数据进行入库;所述数据处理单元,用于对入库的数据进行入库处理;所述入库处理包括:打标签、筛选规则集、文件上下文感知关联、文件信息检索;所述数据存储介质模块监测单元,用于实时监测数据存储介质的状态;The data warehousing module includes an automatic scanning warehousing unit, a data storage medium module monitoring unit and a data processing unit. The automatic scanning warehousing unit is used to write data into the data warehouse through an automatic scanning method; the automatic scanning warehousing method Including: scanning and warehousing according to the original tree-shaped folder, undifferentiated data scanning and warehousing, file screening and warehousing, and specific location data warehousing, wherein, the scanning and warehousing according to the original tree-shaped folder means that according to the folder The tree-like structure scans the files into the library; the undifferentiated data scans into the library means that the folder structure is not preserved, and all files are placed in a list to be scanned into the library; the file screening into the library refers to specific files Carrying out warehousing; the warehousing of specific location data refers to warehousing of data located at a specific storage location; the data processing unit is used to perform warehousing processing on the warehousing data; the warehousing processing includes: Labels, screening rule sets, file context-aware associations, and file information retrieval; the data storage medium module monitoring unit is used to monitor the state of the data storage medium in real time; 所述数据存储模块,用于对入库的数据进行存储;The data storage module is used to store the stored data; 所述数据浏览模块,用于对数据仓库内的数据进行浏览;The data browsing module is used to browse the data in the data warehouse; 所述数据查询定位模块,用于对数据仓库内的数据进行查询操作,并对数据的存储位置实现定位;The data query and positioning module is used for querying the data in the data warehouse and locating the storage location of the data; 所述数据下载模块,用于基于所述数据查询定位模块的运行结果,根据需要对查询到的列表数据进行选择性下载。The data download module is configured to selectively download the queried list data based on the operation result of the data query and positioning module as required. 2.根据权利要求 1 所述的管理系统,其特征在于,还包括数据存储介质模块和通讯控制器模块,所述通讯控制器模块用于对所述数据存储介质模块进行调度控制。2 . The management system according to claim 1 , further comprising a data storage medium module and a communication controller module, wherein the communication controller module is configured to perform scheduling control on the data storage medium module. 3 . 3.根据权利要求 2 所述的管理系统,其特征在于,所述数据存储介质模块为硬盘存储立方体,所述硬盘存储立方体由若干个硬盘柜堆叠构成,所述通讯控制器模块用于对每个硬盘柜进行调度控制;3. The management system according to claim 2, wherein the data storage medium module is a hard disk storage cube, and the hard disk storage cube is formed by stacking several hard disk cabinets, and the communication controller module is used for each A hard disk cabinet for scheduling control; 所述数据入库模块还包括人工入库单元,The data warehousing module also includes a manual warehousing unit, 所述人工入库单元,用于通过人工录入的方式将数据写入数据仓库。The manual storage unit is used to write data into the data warehouse by manual entry. 4.根据权利要求 3 所述的管理系统,其特征在于,所述数据存储模块的数据存储方式包括硬盘柜数据存储和离线硬盘存储,所述数据浏览方式包括放大、缩小、全图显示、全图放大、全图缩小、漫游、指针、地图刷新,所述数据查询定位模块的数据查询操作包括堆场/集装箱查询、标签查询、集装箱内查询、文件通用查询、离线/在线数据查询、相同文件查询;所述数据下载模块中的下载方式包括:单选文件下载、多选文件下载、数据包文件下载、多文件排队下载、离线数据延时下载、下载断点续传。4. The management system according to claim 3, wherein the data storage mode of the data storage module includes hard disk cabinet data storage and offline hard disk storage, and the data browsing mode includes zoom in, zoom out, full image display, full Image enlargement, full image reduction, roaming, pointer, and map refresh. The data query operations of the data query and positioning module include yard/container query, label query, in-container query, file general query, offline/online data query, identical files Query; the download methods in the data download module include: single-selection file download, multiple-selection file download, data package file download, multi-file queuing download, offline data delay download, and download breakpoint resuming. 5.根据权利要求4 所述的管理系统,其特征在于,所述堆场/集装箱查询是指查询属于某一个规则集的数据;所述标签查询是指查询有特定标签的数据;所述集装箱内查询是指针对每个规则集设置特有的查询方式进行查询;所述文件通用查询是指按照文件的属性进行查询;所述离线/在线数据查询是指支持离线数据与在线数据的无差别查询;所述相同文件查询是指通过查询文件的 md5 码清除重复的文件。5 . The management system according to claim 4 , wherein the yard/container query refers to querying data belonging to a certain rule set; the label query refers to querying data with a specific label; the container Internal query refers to setting a unique query method for each rule set to query; the general file query refers to querying according to the attributes of the file; the offline/online data query refers to indistinguishable query that supports offline data and online data ; The same file query refers to clearing the duplicate files by querying the md5 code of the file. 6.根据权利要求 1 所述的管理系统,其特征在于,还包括:用户权限管理模块,用于对用户权限进行分配和管理。6. The management system according to claim 1, further comprising: a user rights management module, configured to allocate and manage user rights. 7.一种数据仓库管理方法,所述方法应用于权利要求1-6任一所述的数据仓库管理系统,其特征在于,包括以下步骤:7. A data warehouse management method, the method is applied to the data warehouse management system described in any one of claims 1-6, characterized in that, comprising the following steps: S1.通过自动扫描方式将数据写入数据仓库;自动扫描入库方式包括:按照原始树形文件夹方式扫描入库、无差别数据扫描入库、文件筛选入库、特定位置数据入库,所述按照原始树形文件夹方式扫描入库是指按照文件夹的树状结构将文件扫描入库;所述无差别数据扫描入库是指不保留文件夹结构,将所有文件都放在一个列表中扫描入库;所述文件筛选入库是指对特定文件进行入库;所述特定位置数据入库是指对位于特定存储位置的数据进行入库;对入库的数据进行入库处理;所述入库处理包括:打标签、筛选规则集、文件上下文感知关联、文件信息检索;实时监测数据存储介质的状态;S1. Write data into the data warehouse through automatic scanning; the automatic scanning and warehousing methods include: scanning and warehousing according to the original tree-shaped folder, undifferentiated data scanning and warehousing, file screening and warehousing, and specific location data warehousing. The above-mentioned scanning and warehousing according to the original tree-shaped folder means that the files are scanned into the warehousing according to the tree-like structure of the folders; the undifferentiated data scanning and warehousing refers to that all files are placed in a list without preserving the folder structure. Mid-scan storage; the file screening storage refers to storage of specific files; the storage of specific location data refers to storage of data located at a specific storage location; storage processing is performed on the data stored in the storage; The storage processing includes: tagging, screening rule sets, file context-aware association, file information retrieval; real-time monitoring of the state of the data storage medium; S2.对入库的数据进行存储;S2. Store the stored data; S3.对数据仓库内的数据进行浏览;S3. Browse the data in the data warehouse; S4.对数据仓库内的数据进行查询操作,并对数据的存储位置实现定位;S4. Query the data in the data warehouse, and locate the storage location of the data; S5.基于所述数据查询定位模块的运行结果,根据需要对查询到的列表数据进行选择性下载。S5. Selectively download the queried list data according to the operation result of the data query and positioning module. 8.根据权利要求7 所述的方法,其特征在于,所述步骤 S2 中数据存储方式包括硬盘柜数据存储和离线硬盘存储,所述步骤 S3 中数据浏览方式包括放大、缩小、全图显示、全图放大、全图缩小、漫游、指针、地图刷新,所述步骤 S4 中数据查询操作包括堆场/集装箱查询、标签查询、集装箱内查询、文件通用查询、离线/在线数据查询、相同文件查询;所述步骤S5 中数据下载方式包括:单选文件下载、多选文件下载、数据包文件下载、多文件排队下载、离线数据延时下载、下载断点续传。8 . The method according to claim 7 , wherein the data storage methods in step S2 include hard disk cabinet data storage and offline hard disk storage, and the data browsing methods in step S3 include zoom-in, zoom-out, full-picture display, Enlarge the whole image, zoom out the whole image, roaming, pointer, and map refresh. The data query operations in step S4 include yard/container query, label query, in-container query, file general query, offline/online data query, and same file query ; The data download methods in the step S5 include: single-selection file download, multiple-selection file download, data package file download, multi-file queuing download, offline data delay download, and download breakpoint resuming. 9.根据权利要求 8所述的方法,其特征在于,所述堆场/集装箱查询是指查询属于某一个规则集的数据;所述标签查询是指查询有特定标签的数据;所述集装箱内查询是指针对每个规则集设置特有的查询方式进行查询;所述文件通用查询是指按照文件的属性进行查询;所述离线/在线数据查询是指支持离线数据与在线数据的无差别查询;所述相同文件查询是指通过查询文件的 md5 码清除重复的文件。9. The method according to claim 8, wherein the yard/container query refers to querying data belonging to a certain rule set; the label query refers to querying data with a specific label; The query refers to setting a unique query mode for each rule set to query; the general file query refers to querying according to the attributes of the file; the offline/online data query refers to supporting indiscriminate query between offline data and online data; The query of the same file refers to removing duplicate files by querying the md5 code of the file. 10.根据权利要求 7 或 9 所述的方法,其特征在于,所述步骤 S1 之前还包括:S0.对用户权限进行分配和管理。10. The method according to claim 7 or 9, wherein before the step S1, the method further comprises: S0. Allocating and managing user rights.
CN201810201836.6A 2018-03-12 2018-03-12 A data warehouse management system and management method Expired - Fee Related CN108549659B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810201836.6A CN108549659B (en) 2018-03-12 2018-03-12 A data warehouse management system and management method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810201836.6A CN108549659B (en) 2018-03-12 2018-03-12 A data warehouse management system and management method

Publications (2)

Publication Number Publication Date
CN108549659A CN108549659A (en) 2018-09-18
CN108549659B true CN108549659B (en) 2021-08-06

Family

ID=63516102

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810201836.6A Expired - Fee Related CN108549659B (en) 2018-03-12 2018-03-12 A data warehouse management system and management method

Country Status (1)

Country Link
CN (1) CN108549659B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110647766A (en) * 2019-09-19 2020-01-03 上海易点时空网络有限公司 Method and system for ensuring file downloading safety of data warehouse
CN110941586A (en) * 2019-10-25 2020-03-31 深圳市毕美科技有限公司 Engineering design data management method and system
CN114281856A (en) * 2021-12-27 2022-04-05 中国工商银行股份有限公司 Offline data management method and related device
CN114372104A (en) * 2022-01-10 2022-04-19 苏州久知联信息技术有限公司 Electronic file metadata acquisition tool and method with good compatibility
CN116796772A (en) * 2023-08-25 2023-09-22 北京思谨科技有限公司 Intelligent file cabinet control system of dynamic RFID

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101873226A (en) * 2010-06-21 2010-10-27 中兴通讯股份有限公司 Data storage method and device for statistical form system
CN102722529A (en) * 2012-05-18 2012-10-10 苏州万图明电子软件有限公司 Business information query system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102339323B (en) * 2011-11-11 2015-12-16 江苏鸿信系统集成有限公司 A kind of method of carrying out data pick-up for DB2 data warehouse, dispatching and representing
US20140201192A1 (en) * 2013-01-15 2014-07-17 Syscom Computer Engineering Co. Automatic data index establishment method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101873226A (en) * 2010-06-21 2010-10-27 中兴通讯股份有限公司 Data storage method and device for statistical form system
CN102722529A (en) * 2012-05-18 2012-10-10 苏州万图明电子软件有限公司 Business information query system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
面向大型装备状态分析的分布式实时数据仓库构建技术;刘彦均 等;《计算机集成制造系统》;20171015;第2326-2329页 *

Also Published As

Publication number Publication date
CN108549659A (en) 2018-09-18

Similar Documents

Publication Publication Date Title
CN108549659B (en) A data warehouse management system and management method
US9158532B2 (en) Methods for managing applications using semantic modeling and tagging and devices thereof
US10496627B2 (en) Consistent ring namespaces facilitating data storage and organization in network infrastructures
US10140461B2 (en) Reducing resource consumption associated with storage and operation of containers
US6922761B2 (en) Method and system for migrating data
US7882098B2 (en) Method and system for searching stored data
US7734593B2 (en) Systems and methods for classifying and transferring information in a storage network
US20120159479A1 (en) Providing a persona-based application experience
CN109643302A (en) For the Storage Virtualization of file
US20200050583A1 (en) Storing and retrieving restricted datasets to and from a cloud network with non-restricted datasets
US20080021865A1 (en) Method, system, and computer program product for dynamically determining data placement
US20090254585A1 (en) Method for Associating Administrative Policies with User-Definable Groups of Files
CN104881466A (en) Method and device for processing data fragments and deleting garbage files
US20230315584A1 (en) Backing up data for a namespace assigned to a tenant
CN110019048A (en) Document handling method, device, system and server based on MongoDB
WO2014110940A1 (en) A method, apparatus and system for storing, reading the directory index
US7080102B2 (en) Method and system for migrating data while maintaining hard links
JPH04232563A (en) Document controlling method
CN109756484A (en) Control method, control device, gateway and the medium of gateway based on object storage
US11556398B2 (en) Centralized data management
Rabinovici-Cohen et al. PDS cloud: long term digital preservation in the cloud
CN107408239B (en) Architecture for massive data management in communication applications through multiple mailboxes
US6952699B2 (en) Method and system for migrating data while maintaining access to data with use of the same pathname
CN112262378A (en) Hydration of a hierarchy of dehydrated documents
CN110489060A (en) A kind of mixed file construction method and its system based on FUSE technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210806

CF01 Termination of patent right due to non-payment of annual fee