[go: up one dir, main page]

CN110287152B - Data management method and related device - Google Patents

Data management method and related device Download PDF

Info

Publication number
CN110287152B
CN110287152B CN201910570378.8A CN201910570378A CN110287152B CN 110287152 B CN110287152 B CN 110287152B CN 201910570378 A CN201910570378 A CN 201910570378A CN 110287152 B CN110287152 B CN 110287152B
Authority
CN
China
Prior art keywords
data
type
access
interface
page
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910570378.8A
Other languages
Chinese (zh)
Other versions
CN110287152A (en
Inventor
张文亮
李昕龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Tencent Computer Systems Co Ltd
Original Assignee
Shenzhen Tencent Computer Systems Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Tencent Computer Systems Co Ltd filed Critical Shenzhen Tencent Computer Systems Co Ltd
Priority to CN201910570378.8A priority Critical patent/CN110287152B/en
Publication of CN110287152A publication Critical patent/CN110287152A/en
Application granted granted Critical
Publication of CN110287152B publication Critical patent/CN110287152B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/113Details of archiving
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/119Details of migration of file systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0647Migration mechanisms
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0662Virtualisation aspects
    • G06F3/0664Virtualisation aspects at device level, e.g. emulation of a storage device or system
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0679Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data management method and a related device, wherein the access information of data to storage equipment is obtained, the access information comprises the type of the storage equipment, the access type or the data type, and a proper interface is selected for operation according to the access information, so that a large amount of time delay caused by using the same interface by different storage equipment can be avoided, corresponding read-write logic adjustment is carried out on the non-volatile equipment according to the characteristics of the non-volatile equipment, the utilization rate of the non-volatile equipment in a database is improved, and the performance of the database is further improved.

Description

Data management method and related device
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method for data management and a related device.
Background
Database systems are a widely used base software. The database system is limited by the physical properties of the device, and different design choices are required for different devices in order to fully exploit the capabilities of the device to achieve optimal performance.
Currently, most common designs for two-level storage of memory and block devices are hard disk (HARD DISK DRIVE, HDD) and flash disk (solid STATE DISK, SSD), which are the main types of systems, because hard disks are widely used block devices, and the former type of systems are also called hard disk databases; the existing hard disk database system considers the combination of two layers of storage media of the memory and the block equipment, and the memory cache is used for making up the performance gap of the block equipment. On the other hand, since the read performance of the nonvolatile device is close to the memory, and the write performance is close to the high-end SSD, the two-level storage design of the block device and the nonvolatile device can also be adopted.
However, since the access load of the database has the characteristic of high input/output and high throughput, the access has a large amount of delay due to the fact that the data is subjected to a standard portable operating system interface (portable operating SYSTEM INTERFACE of unix, POSIX) and a file system during read-write based on the prior scheme, and the performance of the database is affected.
Disclosure of Invention
In view of this, a first aspect of the present application provides a method for data management, which can be applied to database design or related application products, and specifically includes: acquiring access information of data to storage equipment, wherein the access information comprises the type of the storage equipment, the access type or the data type; determining an access interface according to the access information; and selecting the access interface to operate on the data.
Preferably, in some possible implementations of the present application, the storage device type includes a nonvolatile device or a block device, and the determining an access interface according to the access information includes: if the storage device type is a nonvolatile device, determining that the access interface is a persistent memory development kit PMDK interface; and if the storage equipment type is block equipment, determining that the access interface is a POSIX interface of the portable operating system.
Preferably, in some possible implementations of the present application, the access type includes read data or write data, and the selecting the access interface to operate on the data includes: if the access type is reading data and the access interface is PMDK interfaces, reading the data to a specified memory buffer area page; if the access type is write data and the access interface is PMDK, writing the data into the appointed virtual memory mapping address.
Preferably, in some possible implementations of the present application, the data type includes a data table file or a temporary file, and the selecting the access interface to operate on the data includes: if the data type is a data table file, the storage device is a nonvolatile device, and the access type is read data, searching a data page in a data cache area, wherein the data page is used for indicating the data; if the data page exists, directly reading the data; otherwise, mapping the data to a specified virtual memory mapping address, wherein the virtual memory mapping address is used for indicating to acquire the data.
Preferably, in some possible implementations of the present application, before the selecting the access interface to operate on the data, the method further includes: acquiring a ratio value of read data and written data in the data; and if the proportion value is larger than a preset threshold value, migrating part of the data to nonvolatile equipment for processing.
A second aspect of the present application provides another data management apparatus, comprising: the device comprises an acquisition unit, a storage device and a storage device, wherein the acquisition unit is used for acquiring access information of data to the storage device, and the access information comprises a storage device type, an access type or a data type; the determining unit is used for determining an access interface according to the access information; and the selection unit is used for selecting the access interface to operate on the data.
Preferably, in some possible implementations of the present application, the storage device type includes a nonvolatile device or a block device, and the determining unit is specifically configured to determine that the access interface is a persistent memory development kit PMDK interface if the storage device type is a nonvolatile device; the determining unit is specifically configured to determine that the access interface is a POSIX interface of the portable operating system if the storage device type is a block device.
Preferably, in some possible implementations of the present application, the access type includes read data or write data, and the selecting unit is specifically configured to read the data to a specified memory buffer page if the access type is read data and the access interface is PMDK interfaces; the selecting unit is specifically configured to write the data to the specified virtual memory mapped address if the access type is write data and the access interface is PMDK interfaces.
Preferably, in some possible implementations of the present application, the data type includes a data table file or a temporary file, and the selecting unit is specifically configured to search a data page in a data cache area if the data type is a data table file, the storage device is a nonvolatile device, and the access type is read data, where the data page is used to indicate the data; the selecting unit is specifically configured to directly read the data if the data page exists; the selection unit is specifically configured to map the data to a specified virtual memory mapping address, where the virtual memory mapping address is used to indicate that the data is acquired.
Preferably, in some possible implementations of the present application, before the selecting the access interface to operate on the data, the determining unit is further configured to obtain a ratio value of read data to write data in the data; and if the proportion value is larger than a preset threshold value, migrating part of the data to nonvolatile equipment for processing.
A third aspect of the present application provides a computer apparatus comprising: a memory, a processor, and a bus system; the memory is used for storing program codes; the processor is configured to perform the method of data management according to the first aspect or any one of the first aspects according to instructions in the program code.
A fourth aspect of the application provides a computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the method of data management of the first aspect or any of the first aspects described above.
From the above technical solutions, the embodiment of the present application has the following advantages:
By acquiring the access information of the data to the storage device, wherein the access information comprises the storage device type, the access type or the data type, and selecting a proper interface for operation according to the access information, a large amount of time delay generated by using the same interface by different storage devices can be avoided, and corresponding read-write logic adjustment is performed on the non-volatile device according to the characteristics of the non-volatile device, so that the utilization rate of the non-volatile device in a database system is improved, and the performance of the database system is further improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a prior art architecture diagram for data management;
FIG. 2 is a system architecture diagram of data management according to an embodiment of the present application;
FIG. 3 is a flow chart of a method for data management according to an embodiment of the present application;
FIG. 4 is a flow chart of another method for data management according to an embodiment of the present application;
FIG. 5 is a flow chart of another method for data management according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a management device according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of another management apparatus according to an embodiment of the present application.
Detailed Description
The embodiment of the application provides a data management method and a related device, by acquiring access information of data to storage equipment, wherein the access information comprises the type of the storage equipment, the access type or the data type, and selecting a proper interface for operation according to the access information, a large amount of time delay generated by using the same interface by different storage equipment can be avoided, corresponding read-write logic adjustment is carried out on the non-volatile equipment according to the characteristics of the non-volatile equipment, the utilization rate of the non-volatile equipment in a database is improved, and the performance of the database is further improved.
The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented, for example, in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "includes" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.
It should be appreciated that the data management method provided by the present application may be applied to designs applicable to multi-level storage, such as: in a two-level storage design based on a combination of a block device and a nonvolatile device, wherein the block device can be an HDD or an SSD; the non-volatile device may be DCPMM; specifically, the data management method provided by the application can be applied to databases related to the hardware, for example: in a database system or related application product designed based on structured query language (structured query language, SQL), for example: the cloud database product is made up, and the specific language type depends on the actual scene.
Database systems are a widely used base software. The database system is limited by the physical properties of the device, and different design choices are required for different devices in order to fully exploit the capabilities of the device to achieve optimal performance.
Currently, the most common design aims at two-stage storage of a memory and block equipment, and the main stream of block equipment comprises a hard disk and a flash disk, wherein the hard disk is widely used block equipment, and the former type of system is also called a hard disk database; the existing hard disk database system considers the combination of two layers of storage media of the memory and the block equipment, and the memory cache is used for making up the performance gap of the block equipment. As shown in fig. 1, a prior art architecture diagram for data management is shown, where the database system architecture considers the use of two levels of storage media of memory and block devices, and uses memory cache to make up for performance differences of the block devices. Taking MySQL database on Linux operating system as an example, it accesses data stored on block device through standard POSIX file system interface, via EXT4, XFS, etc. file system.
However, since the access load of the database has the characteristic of high input/output and high throughput, the access has a large amount of delay due to the fact that the data is subjected to a standard portable operating system interface (portable operating SYSTEM INTERFACE of unix, POSIX) and a file system during read-write based on the prior scheme, and the performance of the database is affected. Moreover, the read performance of the nonvolatile device is close to the memory, while the write performance is close to the high-end SSD, and the write performance is required to be designed in a targeted manner so as to possibly exert the full performance. The access load of the database has the characteristic of high IO and high throughput, and the performance of hardware can be exerted only by carrying out proper optimization aiming at different types of read-write operations, so that the service requirement can be met.
In order to solve the above-mentioned problems, the present application provides a data management method, which is applied to the data management system framework shown in fig. 2, please refer to fig. 2, fig. 2 is a system architecture diagram of data management provided by an embodiment of the present application, taking MySQL database on Linux operating system as an example, compared with the system architecture of data management in the prior art, the present application uses a nonvolatile device in combination with a block device, and designs data processing logic for the nonvolatile device, where the data processing logic at least includes: acquiring access information of data to storage equipment, wherein the access information comprises the type of the storage equipment, the access type or the data type; determining an access interface according to the access information; and selecting the access interface to operate on the data.
It can be understood that the method provided by the application can be a program writing method, which is used as a processing logic in a database system, and can also be used as a management device, and the processing logic can be realized in an integrated or external mode. As an implementation mode, the management device obtains the access information of the data to the storage device, wherein the access information comprises the type of the storage device, the access type or the data type, and selects a proper interface to operate according to the access information, so that a large amount of time delay caused by using the same interface by different storage devices can be avoided, corresponding read-write logic adjustment is carried out on the non-volatile device according to the characteristics of the non-volatile device, the utilization rate of the non-volatile device in a database is improved, the performance of the database is further improved, and the redesign and the code rewriting of the existing database system are avoided because the device where the file is located is automatically identified and a proper read-write mode is automatically adopted. New devices are introduced into the system at relatively little cost and improve the performance of the system.
With reference to fig. 3, fig. 3 is a flowchart of a method for data management according to an embodiment of the present application, which includes at least the following steps:
301. The management means obtains access information of the data to the storage device.
In this embodiment, the access information may include a storage device type, an access type, or a data type, and a specific acquisition order may be simultaneous or sequential, where the specific order depends on an actual scenario and is not limited herein.
The storage device types may include nonvolatile devices and block devices, the nonvolatile devices may be DCPMM, and the block devices may be HDDs and SSDs; the access types may include read-only, write-only, or read-write; the data type may include a data table file or a temporary file.
Specifically, taking an SQL data processing process as an example, firstly deducing a specific file name, a page number and an access type to be accessed according to context information such as SQL statement type and the like; and distinguishes between read-only, write-only, and read-write access types. For each file, the type of storage medium in which it resides, including disk, SSD, or nonvolatile device, is automatically identified. For each file, its file type, whether a data table file or a temporary file, is also recorded.
It will be appreciated that the data may be in the form of characters, bytes, text, or files, tables, etc. that are based on the foregoing types.
Optionally, for the acquired access information, it may be identified in the data, for example: a certain field in the data is designated as an identification field, and the management device only needs to identify the field information when identifying the access information of the data, and it is understood that the identification mode can be writing of the field, additional text information, or other components with identification functions, and the specific mode is determined by the actual scene and is not limited herein.
Alternatively, the access information may be obtained according to a user setting, for example: the user sets the access type of the data table to be read-only, and then carries out corresponding identification on the data, and the access information can be obtained according to similar data in the history statistics, for example: for city information encoding tables, it is identified as mainly read access and corresponding operations are performed.
302. The management device determines an access interface according to the access information.
In this embodiment, a file is taken as an example to describe, for each specific file, identifying the type of the storage device where the specific file is located, and if the specific file is a block device and a common file system, accessing the specific file by using a standard POSIX file system interface; if the management apparatus is a non-volatile device, then PMDK interface access is employed.
In one possible scenario, if the management apparatus is a non-volatile device and the file system supports the PMDK DAX direct access mode, PMDK interface access is employed due to the problem of adaptation between devices; otherwise, the standard POSIX interface is also adopted for access, so that the operation problem caused by the fact that the equipment cannot be adapted can be solved through screening of the adaptation mode.
Optionally, the access information may further include a ratio of read data to write data in the data, since the management device analyzes and records an access pattern of each data file. The management means provide a preset threshold, for example, a data file with a read operation rate exceeding 70% will migrate to the nonvolatile device. Under the default condition, the background thread of the system can automatically migrate files with high read-only proportion from the block device to the nonvolatile device or migrate files with high write operation proportion from the nonvolatile device to the block device according to parameters such as configured threshold values and the like. The management device also allows the user to execute specific SQL commands to migrate the data files.
It will be appreciated that the preset threshold in the above embodiment may be set by a user, or may be generated according to a history, and the specific manner is specific to an actual scenario, which is not limited herein.
It should be noted that the ratio of the read data to the write data in the above embodiment may also be a ratio of related form data including the read data or the write data, for example: the proportion of read-only data or the proportion of read-write data.
303. And the management device selects the access interface to operate on the data.
In this embodiment, the data operation mainly includes a writing or reading process, which may be directly implemented or the purpose achieved after the data processing.
In this embodiment, for file reading, if the file is a POSIX interface, APIs such as READ and PREAD are used to READ the designated data buffer page. If it is PMDK interface, the specified data buffer page is read using an API such as MEMCPY.
In this embodiment, for writing a file, if the file is a POSIX interface, APIs such as WRITE and PWRITE are used to WRITE a specified memory buffer page to the file. If it is PMDK interface, it uses MEMCPY API to write to the assigned virtual memory mapped address.
It should be noted that, if the type of the data is a temporary file, the read-write process does not involve the process of caching, and the corresponding interface is selected directly according to the type of the storage device.
When the database system caches the data and the log in the memory, the principle of writing the log first is followed, namely, the log is firstly brushed out to the hard disk, and then the modified data page is brushed out to the hard disk, so that the specific writing process can comprise the following steps: firstly, data modification is carried out in a data buffer area, and a log of the data modification is written into a log buffer area; when receiving a transaction submitting command, brushing a log of a log buffer area to a log file; when the data buffer area eliminates pages, writing the data buffer pages to the double-writing buffer area in batch sequentially; finally, the buffered page is written out to the data file.
Specifically, for the process of reading data, firstly, whether a page to be read exists or not is searched in a data buffer area. If so, the buffered page is read. Otherwise, a free page is allocated in the buffer and the page is read from the data file to the buffer page.
According to the embodiment, the access information of the data to the storage device is obtained, the access information comprises the type of the storage device, the access type or the data type, and the appropriate interface is selected to operate according to the access information, so that a large amount of time delay caused by using the same interface by different storage devices can be avoided, the corresponding read-write logic is adjusted according to the characteristics of the nonvolatile device, the utilization rate of the nonvolatile device in the database is improved, the performance of the database is further improved, and the redesign and the code rewriting of the existing database system are avoided due to the fact that the device where the file is located is automatically identified and the appropriate read-write mode is automatically adopted. New devices are introduced into the system at relatively little cost and improve the performance of the system.
In a possible scenario, for a newly written buffer page, the buffer page may be recorded, and when a relevant page needs to be read again, the buffered page may be directly read, and the scenario is described below with reference to the accompanying drawings, as shown in fig. 4, fig. 4 is a flowchart of another method for data management provided by an embodiment of the present application, where the embodiment of the present application at least includes the following steps:
401. The management means obtains access information of the data to the storage device.
In this embodiment, the access information may include a storage device type, an access type, or a data type, and a specific acquisition order may be simultaneous or sequential, where the specific order depends on an actual scenario and is not limited herein.
The storage device types may include nonvolatile devices and block devices, the nonvolatile devices may be DCPMM, and the block devices may be HDDs and SSDs; the access types may include read-only, write-only, or read-write; the data type may include a data table file or a temporary file.
Specifically, taking an SQL data processing process as an example, firstly deducing a specific file name, a page number and an access type to be accessed according to context information such as SQL statement type and the like; and distinguishes between read-only, write-only, and read-write access types. For each file, the type of storage medium in which it resides, including disk, SSD, or nonvolatile device, is automatically identified. For each file, its file type, whether a data table file or a temporary file, is also recorded.
It will be appreciated that the data may be in the form of characters, bytes, text, or files, tables, etc. that are based on the foregoing types.
Optionally, for the acquired access information, it may be identified in the data, for example: a certain field in the data is designated as an identification field, and the management device only needs to identify the field information when identifying the access information of the data, and it is understood that the identification mode can be writing of the field, additional text information, or other components with identification functions, and the specific mode is determined by the actual scene and is not limited herein.
Alternatively, the access information may be obtained according to a user setting, for example: the user sets the access type of the data table to be read-only, and then carries out corresponding identification on the data, and the access information can be obtained according to similar data in the history statistics, for example: for city information encoding tables, it is identified as mainly read access and corresponding operations are performed.
402. The management device determines an access interface according to the access information.
In this embodiment, a file is taken as an example to describe, for each specific file, identifying the type of the storage device where the specific file is located, and if the specific file is a block device and a common file system, accessing the specific file by using a standard POSIX file system interface; if the management apparatus is a non-volatile device, then PMDK interface access is employed.
In one possible scenario, if the management apparatus is a non-volatile device and the file system supports the PMDK DAX direct access mode, PMDK interface access is employed due to the problem of adaptation between devices; otherwise, the standard POSIX interface is also adopted for access, so that the operation problem caused by the fact that the equipment cannot be adapted can be solved through screening of the adaptation mode.
Optionally, the access information may further include a ratio of read data to write data in the data, since the management device analyzes and records an access pattern of each data file. The management means provide a preset threshold, for example, a data file with a read operation rate exceeding 70% will migrate to the nonvolatile device. Under the default condition, the background thread of the system can automatically migrate files with high read-only proportion from the block device to the nonvolatile device or migrate files with high write operation proportion from the nonvolatile device to the block device according to parameters such as configured threshold values and the like. The management device also allows the user to execute specific SQL commands to migrate the data files.
It will be appreciated that the preset threshold in the above embodiment may be set by a user, or may be generated according to a history, and the specific manner is specific to an actual scenario, which is not limited herein.
It should be noted that the ratio of the read data to the write data in the above embodiment may also be a ratio of related form data including the read data or the write data, for example: the proportion of read-only data or the proportion of read-write data.
403. The management device looks up page information in the data buffer.
In this embodiment, the page information may include a directory or an index, where each time the data is written into or read from the data buffer, a corresponding index is generated and a corresponding directory is generated according to the index, and when the page information needs to be searched, the page information is searched in the directory according to the index.
404. The management device operates on the data according to the page information.
In this embodiment, for the case that the storage device is a block device, if there is corresponding page information, the page information is read, so that the process of reading data can be accelerated; if the corresponding page information does not exist, corresponding data are cached, an index is generated and stored in a cache directory, and then the specified data buffer pages are READ through POSIX interfaces by adopting APIs such as READ, PREAD and the like.
In this embodiment, for the case that the storage device is a nonvolatile device, if there is corresponding page information, the page information is read, so that the process of reading data can be accelerated; if the corresponding page information does not exist, the data can be mapped to the memory virtual address directly through the PMDK interface so as to acquire the corresponding data.
The read performance of the nonvolatile device is close to the memory, the write performance is close to the high-performance SSD, and the data buffer area can be omitted, so that time is saved, and the embodiment of the application is equivalent to optimizing the data read process.
It should be noted that the operations on the file include a read-only, write-only or read-write process, and the above embodiments illustrate that the nonvolatile device may save time in the read process, where the read process may be a read data process in the read-only process or a read data process in the read-write process, and this process may be understood as optimizing the read process or other data processing operations including the read process, depending on the actual scenario.
In the above embodiment, the determination of the storage device type, the access type and the file type may be performed in a certain order or simultaneously, and a scenario is described below, where a determination logic of the access information is described, as shown in fig. 5, fig. 5 is a flowchart of another method for data management according to an embodiment of the present application, where the embodiment of the present application at least includes the following steps:
501. and judging the type of the storage device.
In this embodiment, the storage device types may include a nonvolatile device, which may be DCPMM, and a block device, which may be an HDD and an SSD.
502. And if the storage device type is a nonvolatile device, selecting a PDMK interface.
503. And if the storage device type is block device, selecting a POSIX interface.
504. And judging the access type.
In this embodiment, the access type may include read-only, write-only, or read-write.
505. Data is obtained directly through PMDK.
In this embodiment, for the case that the storage device is a nonvolatile device, if there is corresponding page information, the page information is read through the PMDK interface, so that the process of data reading can be accelerated; if the corresponding page information does not exist, the data can be mapped to the memory virtual address directly through the PMDK interface so as to acquire the corresponding data.
506. The data buffer is looked up.
In this embodiment, the page information may include a directory or an index, where each time the data is written into or read from the data buffer, a corresponding index is generated and a corresponding directory is generated according to the index, and when the page information needs to be searched, the page information is searched in the directory according to the index.
507. And selecting a corresponding form according to the searching condition of the data buffer area to acquire data.
In this embodiment, for the case that the storage device is a block device, if there is corresponding page information, the page information is read, so that the process of reading data can be accelerated; if the corresponding page information does not exist, corresponding data are cached, an index is generated and stored in a cache directory, and then the specified data buffer pages are READ through POSIX interfaces by adopting APIs such as READ, PREAD and the like.
According to the method and the device, the access information of the data to the storage device is obtained, the access information comprises the type of the storage device, the access type or the data type, and the appropriate interface is selected to operate according to the access information, so that a large amount of time delay caused by the fact that different storage devices use the same interface can be avoided, corresponding read-write logic adjustment is carried out on the non-volatile device according to the characteristics of the non-volatile device, the utilization rate of the non-volatile device in a database is improved, the performance of the database is further improved, and the re-design and code rewriting of an existing database system are avoided due to the fact that the device where a file is located is automatically identified and an appropriate read-write mode is automatically adopted. New devices are introduced into the system at relatively little cost and improve the performance of the system.
In order to better implement the above-described aspects of the embodiments of the present application, the following provides related apparatuses for implementing the above-described aspects. Referring to fig. 6, fig. 6 is a schematic structural diagram of a management apparatus according to an embodiment of the present application, and a data processing management apparatus 600 includes:
An obtaining unit 601, configured to obtain access information of data to a storage device, where the access information includes a storage device type, an access type, or a data type;
a determining unit 602, configured to determine an access interface according to the access information;
and the selecting unit 603 is configured to select the access interface to operate on the data.
In some possible implementations of the application, the storage device types may alternatively include non-volatile devices or block devices,
The determining unit 602 is specifically configured to determine that the access interface is a persistent memory development kit PMDK interface if the storage device type is a nonvolatile device;
The determining unit 602 is specifically configured to determine that the access interface is a portable operating system POSIX interface if the storage device type is a block device.
Optionally, in some possible implementations of the application, the access type includes read data or write data,
The selecting unit 603 is specifically configured to read the data to a specified memory buffer page if the access type is read data and the access interface is PMDK interfaces;
The selecting unit 603 is specifically configured to write the data to the specified virtual memory mapped address if the access type is write data and the access interface is PMDK interfaces.
Optionally, in some possible implementations of the application, the data type includes a data table file or a temporary file,
The selecting unit 603 is specifically configured to search a data page in a data cache area if the data type is a data table file, the storage device is a nonvolatile device, and the access type is read data, where the data page is used to indicate the data;
The selecting unit 603 is specifically configured to directly read the data if the data page exists;
the selecting unit 603 is specifically configured to map the data to a specified virtual memory mapped address, where the virtual memory mapped address is used to indicate that the data is acquired.
Optionally, in some possible implementations of the present application, before the selecting the access interface to operate on the data,
The determining unit 602 is further configured to obtain a ratio value of read data to write data in the data;
the determining unit 602 is further configured to migrate a portion of the data to a nonvolatile device for processing if the ratio value is greater than a preset threshold.
The access information of the data to the storage device is obtained through the obtaining unit 601, the access information comprises the storage device type, the access type or the data type, the determining unit 602 selects a proper interface to operate according to the access information, a large amount of time delay caused by using the same interface by different storage devices can be avoided, the selecting unit 603 adjusts corresponding read-write logic according to the characteristics of the nonvolatile device, the utilization rate of the nonvolatile device in the database is improved, the performance of the database is further improved, and the redesign and code rewriting of the existing database system are avoided due to the fact that the device where the file is located is automatically identified and a proper read-write mode is automatically adopted. New devices are introduced into the system at relatively little cost and improve the performance of the system.
Referring to fig. 7, fig. 7 is a schematic structural diagram of another data processing management apparatus according to an embodiment of the present application, where the management apparatus 700 may have a relatively large difference due to different configurations or performances, and may include one or more central processing units (central processing units, CPU) 722 (e.g., one or more processors) and a memory 732, and one or more storage mediums 730 (e.g., one or more mass storage devices) storing application programs 742 or data 744. Wherein memory 732 and storage medium 730 may be transitory or persistent. The program stored in the storage medium 730 may include one or more modules (not shown), each of which may include a series of instruction operations in the management device. Still further, the central processor 722 may be configured to communicate with the storage medium 730 and execute a series of instruction operations in the storage medium 730 on the management device 700.
The management device 700 may also include one or more power supplies 726, one or more wired or wireless network interfaces 750, one or more input/output interfaces 758, and/or one or more operating systems 741, such as Windows ServerTM, mac OS XTM, unixTM, linuxTM, freeBSDTM, and the like.
The steps performed by the management apparatus in the above-described embodiments may be based on the management apparatus structure shown in fig. 7.
Embodiments of the present application also provide a computer readable storage medium having stored therein data management instructions which, when executed on a computer, cause the computer to perform the steps performed by the management apparatus in the method described in the embodiments of fig. 3 to 5.
There is also provided in an embodiment of the application a computer program product comprising data management instructions which, when run on a computer, cause the computer to perform the steps performed by the management apparatus in the method described in the embodiment of figures 3 to 5 above.
The embodiment of the application also provides a data management system, which can comprise the data management device in the embodiment shown in fig. 6 or the data processing management device shown in fig. 7.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in whole or in part in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a management apparatus, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (6)

1. A method of data management, characterized by being applied to a database system that uses a non-volatile device in combination with a block device; the method comprises the following steps:
Acquiring access information of data to storage equipment, wherein the access information comprises the type of the storage equipment, the access type or the data type; the storage device type includes a nonvolatile device or a block device; the access information also comprises the ratio of read data to write data in the data;
if the storage device type is a nonvolatile device and the file system supports a persistent memory development kit PMDK interface, determining that an access interface is the PMDK interface;
if the storage equipment type is block equipment, determining that the access interface is a POSIX interface of a portable operating system;
Acquiring a ratio value of read data and written data in the data;
If the ratio value is greater than a preset threshold value, migrating part of the data to nonvolatile equipment for processing;
If the access type is writing data and the access interface is PMDK interfaces, writing the data into a specified virtual memory mapping address;
if the access type is write-in data and the access interface is a POSIX interface, writing the data into a designated memory buffer page;
If the access type is read data and the storage device type is nonvolatile device, searching a data page in a data cache region, and if the data page does not exist, mapping the data to a specified virtual memory mapping address through a PMDK interface, wherein the virtual memory mapping address is used for indicating acquisition of the data;
If the access type is reading data and the storage device type is block device, searching a data page in a data buffer area, if the data page does not exist, caching the data, generating an index, storing the index in a cache directory, and reading the data to a specified data buffer area page through a POSIX interface.
2. The method of claim 1, wherein the step of determining the position of the substrate comprises,
If the access type is read data and the storage equipment type is nonvolatile equipment, searching a data page in a data cache area, and if the data page exists, reading the data page;
If the access type is reading data and the storage device type is block device, searching a data page in a data cache area, and if the data page exists, reading the data page.
3. An apparatus for data management, characterized by being applied to a database system that uses a nonvolatile device in combination with a block device; the device comprises:
The device comprises an acquisition unit, a storage device and a storage device, wherein the acquisition unit is used for acquiring access information of data to the storage device, and the access information comprises a storage device type, an access type or a data type; the storage device type includes a nonvolatile device or a block device; the access information also comprises the ratio of read data to write data in the data;
the determining unit is configured to determine that the access interface is the PMDK interface if the storage device type is a nonvolatile device and the file system supports the persistent memory development kit PMDK interface;
The determining unit is further configured to determine that the access interface is a POSIX interface of the portable operating system if the storage device type is block device;
the determining unit is further used for obtaining the ratio value of the read data and the written data in the data; if the ratio value is greater than a preset threshold value, migrating part of the data to nonvolatile equipment for processing;
the selecting unit is used for writing the data into the appointed virtual memory mapping address if the access type is writing data and the access interface is PMDK interface;
The selecting unit is further configured to write the data into a specified memory buffer page if the access type is write data and the access interface is a POSIX interface;
The selecting unit is further configured to search a data page in the data cache if the access type is read data and the storage device type is nonvolatile device, and map the data to a specified virtual memory mapping address through a PMDK interface if the data page does not exist, where the virtual memory mapping address is used to indicate that the data is acquired;
the selection unit is further configured to search a data page in the data buffer area if the access type is read data and the storage device type is block device, and if the data page does not exist, buffer the data and generate an index to store in a buffer directory, and then read the data to a specified data buffer area page through a POSIX interface.
4. The apparatus of claim 3, wherein the device comprises a plurality of sensors,
The selecting unit is further configured to search a data page in the data cache area if the access type is read data and the storage device type is nonvolatile device, and read the data page if the data page exists;
The selecting unit is further configured to search a data page in the data cache area if the access type is read data and the storage device type is block device, and read the data page if the data page exists.
5. A computer device, the computer device comprising a processor and a memory:
The memory is used for storing program codes; the processor is configured to perform the method of data management of any of claims 1 to 2 according to instructions in the program code.
6. A computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the method of data management of any of claims 1 to 2.
CN201910570378.8A 2019-06-27 2019-06-27 Data management method and related device Active CN110287152B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910570378.8A CN110287152B (en) 2019-06-27 2019-06-27 Data management method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910570378.8A CN110287152B (en) 2019-06-27 2019-06-27 Data management method and related device

Publications (2)

Publication Number Publication Date
CN110287152A CN110287152A (en) 2019-09-27
CN110287152B true CN110287152B (en) 2024-06-25

Family

ID=68019358

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910570378.8A Active CN110287152B (en) 2019-06-27 2019-06-27 Data management method and related device

Country Status (1)

Country Link
CN (1) CN110287152B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113495679B (en) * 2020-04-01 2022-10-21 北京大学 Optimization method of big data storage access and processing based on non-volatile storage medium
CN113157425B (en) * 2021-05-20 2024-05-03 深圳马六甲网络科技有限公司 Service access processing method, device, equipment and storage medium
CN113254222B (en) 2021-07-13 2021-09-17 苏州浪潮智能科技有限公司 Task allocation method and system for solid state disk, electronic device and storage medium
CN113821507A (en) * 2021-07-28 2021-12-21 腾讯科技(深圳)有限公司 Data processing method and device
CN114063914B (en) * 2021-11-05 2024-04-09 武汉理工大学 Data management method for DRAM-HBM hybrid memory

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101958838A (en) * 2010-10-14 2011-01-26 联动优势科技有限公司 Data access method and device
US9865323B1 (en) * 2016-12-07 2018-01-09 Toshiba Memory Corporation Memory device including volatile memory, nonvolatile memory and controller

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9959203B2 (en) * 2014-06-23 2018-05-01 Google Llc Managing storage devices
JP6932066B2 (en) * 2017-11-07 2021-09-08 株式会社日立製作所 Delivery management device and delivery management method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101958838A (en) * 2010-10-14 2011-01-26 联动优势科技有限公司 Data access method and device
US9865323B1 (en) * 2016-12-07 2018-01-09 Toshiba Memory Corporation Memory device including volatile memory, nonvolatile memory and controller

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
舒继武等.非易失主存的系统软件研究进展.中国科学信息科学.2019,第51卷(第6期),第869-899页. *
非易失主存的系统软件研究进展;舒继武等;中国科学信息科学;第51卷(第6期);第869-899页 *

Also Published As

Publication number Publication date
CN110287152A (en) 2019-09-27

Similar Documents

Publication Publication Date Title
CN110287152B (en) Data management method and related device
US11307769B2 (en) Data storage method, apparatus and storage medium
CN108804031B (en) Optimal record lookup
US20200097218A1 (en) File system block-level tiering and co-allocation
US8793427B2 (en) Remote memory for virtual machines
CN108459826B (en) Method and device for processing IO (input/output) request
US9767140B2 (en) Deduplicating storage with enhanced frequent-block detection
US9052826B2 (en) Selecting storage locations for storing data based on storage location attributes and data usage statistics
CN108268219B (en) Method and device for processing IO (input/output) request
US10678654B2 (en) Systems and methods for data backup using data binning and deduplication
CN113176857A (en) Massive small file access optimization method, device, equipment and storage medium
EP3552109A1 (en) Systems and methods for caching data
KR102538126B1 (en) Tail latency aware foreground garbage collection algorithm
US9772790B2 (en) Controller, flash memory apparatus, method for identifying data block stability, and method for storing data in flash memory apparatus
US11221999B2 (en) Database key compression
US10915533B2 (en) Extreme value computation
CN107341267A (en) A kind of distributed file system access method and platform
US9448727B2 (en) File load times with dynamic storage usage
CN114327272B (en) Data processing method, solid state disk controller and solid state disk
US20170315924A1 (en) Dynamically Sizing a Hierarchical Tree Based on Activity
US11144508B2 (en) Region-integrated data deduplication implementing a multi-lifetime duplicate finder
US10585802B1 (en) Method and system for caching directories in a storage system
US20210349918A1 (en) Methods and apparatus to partition a database
US10585592B2 (en) Disk area isolation method and device
Wu et al. A data management method for databases using hybrid storage systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant