Background
A thread (thread), also known as a lightweight process (LIGHTWEIGHT PROCESS, LWP), is a single sequence of control flows in a process and is the smallest unit of program execution. In an operating system incorporating threads, processes are typically used as the basic unit for allocating resources, and threads are used as the basic unit for independent execution and independent scheduling. Threads may execute concurrently, e.g., multiple threads in a process may execute concurrently. Threads in different processes can also execute concurrently. In particular, in a multi-compute core computer system, such as a computer system having multiple CPU cores, threads of different cores may also execute concurrently.
Multiple threads often need to access the same data when executed concurrently. From the accessed data, the data is shared to different threads. When multiple threads access the shared data, the integrity of the shared data needs to be guaranteed. For example, two threads cannot modify shared data at the same time, and one thread cannot read half of the modified shared data. The classical approach is to use a Lock (Lock) mechanism. For example, a "read lock" is added to data during a read operation on the data by a thread, and a "write lock" is added to data during a write operation on the data by a thread. Before a process performs a read operation on data, a read lock is added to the data, and after the read operation is performed, the lock is read. Similarly, before a process performs a write operation on a piece of data, a write lock is added to the piece of data, and after the write operation is performed, the write lock is released. The read_ref is typically used as a reference count for the read operation thread, and the write_id is used to represent the ID of the write operation thread.
Multiple read locks may be applied to read operations performed by different threads on the same data. For example, thread 1 performs a read operation on a data, then before performing the read operation, the data is read, specifically, the value of read_ref is added by 1 (e.g., the data type of read_ref is shaped and the initial value is 0), and then the data is read. During the read process, thread 2 also performs a read operation on the same data, then adds 1 to the value of read_ref and reads the data. The value of read _ ref is now 2. After the read operation of thread 1 is completed, the value of read_ref is decremented by 1 and the lock is interpreted. At this time, the value of read_ref is 1. After that, thread 2 finishes the read operation on the data, decrements the value of read_ref by 1, and interprets the lock. At this time, the value of read_ref is 0. The read locks for the same data can be repeatedly added, so that the read locks have sharing property.
Write operations are performed on the same data by different threads, and only write once locks can be applied. For example, thread 1 performs a write operation on a data, then the data is written and locked, specifically, the value of the writer_ID is updated to the ID of thread 1 (e.g., the data type of the writer_ID is shaped and the initial value is 0; the ID of any thread is not 0) before the write operation is performed, and then the data is written. During the write process, thread 2 also performs a write operation on the same data, but since the value of the writer_id is not 0 at this time, thread 2 cannot write the lock nor write the data. After the write operation of thread 1 is completed, the write lock is released, i.e., the value of the writer_id is updated to 0. After thread 2 fails to write the lock the previous time and waits for a period of time, it knows that the value of the writer_id is 0 at this time, and can write the lock. The value of the writer_id is then updated to the ID of thread 2, after which the data is written. After the write operation of thread 2 is completed, the write lock is released, i.e., the value of the writer_id is updated to 0. It can be seen that write locks for the same data cannot be repeatedly added, and therefore, mutual exclusivity exists among the write locks.
In addition, the write lock and the read lock are mutually exclusive, that is, at any moment, when the read lock is added to the same data, the write lock cannot be added any more, and when the write lock is added, the read lock cannot be added any more. Thus, a thread needs to check if the value of the writer_id of a data is 0 before performing a read operation on the data. If 0, a read operation can be performed, and if not 0, it is necessary to wait for the writer_id value to become 0. Similarly, a thread needs to check whether the read_ref value of a data is 0 before writing the data. If 0, a write operation can be performed, and if not 0, it is necessary to wait for the read_ref value to become 0. In fact, for a read lock, in more cases, in order to further avoid a write operation performed by another thread for a write operation simultaneously between the operation of checking the value of the write_id and the corresponding read operation, i.e. in order to avoid a collision detection failure caused in this case, it is checked again whether the value of the write_id at this time is 0, after adding 1 to the value of read_ref. If not 0, a read operation is performed. Similarly, for the write lock, in more cases, in order to further avoid a read-lock operation performed by another thread simultaneously for a read operation between the operation of checking the read_ref value and the corresponding write operation, that is, in order to avoid a collision detection failure caused in this case, after updating the value of the writer_id to the ID of the write thread, it is further checked again whether the read_ref value at this time is 0. If not 0, a write operation is performed.
The thread changes the read_ref value and the writer_ID value, and belongs to an atomic operation. Atomic operations are typically instructions provided by the CPU with inseparability. When one thread executes an atomic operation, the thread is not interrupted by other threads, and the thread is not switched to other threads. In other words, such an atomic operation, once started, runs until the operation ends.
In the process of implementing the present application, the inventor finds that at least the following problems exist in the prior art:
In a multi-compute core computer system, threads of different cores may perform read and write operations on the same data. In particular, it often occurs that a large number of read operations are performed on the same data over a period of time without a write operation. Each core typically corresponds to a cache. Each core maintains a read _ ref value in its corresponding cache. And, according to the implementation manner in the prior art, the read_ref value in the corresponding cache of each core is kept consistent. Thus, for a multi-compute core computer system, once a change in the read_ref value in a corresponding cache of one core occurs, it communicates with the other cores to notify of the change. The other cores update the read_ref value in their corresponding caches after receiving the notification.
Thus, in this manner in the prior art, when multiple threads of different cores perform a read operation on the same data, since it takes a certain time for communication between the cores, it takes a certain time for each core to correspond to an atomic operation in the cache to change the read_ref value, and thus the execution efficiency is low.
Detailed Description
The embodiment of the application provides a read lock operation method, a write lock operation method and a write lock operation system.
In order to make the technical solution of the present application better understood by those skilled in the art, the technical solution of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, shall fall within the scope of the application.
First, an embodiment of the read lock operation method of the present application will be described.
FIG. 1 illustrates a flow chart of one embodiment of a read lock operation method of the present application. As shown in fig. 1, the method of this embodiment includes:
And S100, setting private reference counts corresponding to each core.
Modern CPUs employ a number of techniques to counter the delay introduced by memory accesses. During the reading and writing of memory data, the CPU may execute hundreds or thousands of instructions. Multi-level static random access memory (Static Random Access Memory, SRAM) cache (hereinafter referred to as cache) is a primary means of reducing the impact of such delays.
For example, for a dual core computer system, core 1 and core 2 have corresponding caches 1 and 2, respectively. The cache may be a cache of a compute core. For example, for a CPU, it is common for the CPU to have a primary cache, a secondary cache, and even a tertiary cache within the CPU. For a CPU comprising a first-level buffer memory and a second-level buffer memory, the data which the CPU needs to operate is read into the second-level buffer memory from a memory, then read into the first-level buffer memory from the second-level buffer memory, and then read into the CPU from the first-level buffer memory for execution. Generally, the closer to the CPU memory, the faster the speed, but the higher the cost, and the farther from the CPU memory, the slower the speed, but the cheaper the cost. The CPU is generally stored in a memory close to the CPU for frequently reading and writing data, so that the utilization rate of the memory with high cost is improved.
In this step, preferably, a private reference count (private_read_ref) may be placed in the cache. For example, the private reference count may be set in the CPU's level one cache. Of course, depending on the architecture of the CPU and the capacity of the memory at different levels, the private reference count may be set in the secondary cache, or in other memories with read speeds on the same order of the atomic operation speed of the CPU. The embodiment of the present application is not particularly limited thereto. In fact, caches tend to be transparent to the program, i.e., the program has no way to control whether a variable is to be placed in the cache and at which level of cache it is to be placed. When a program is to operate on a variable, the CPU can check whether the variable is in the first level cache, if so, directly read from the first level cache, and if not, check whether the variable is in the second level cache, if so, load the variable from the second level cache to the first level cache, and if the second level cache is not, load the variable from the memory to the second level cache and the first level cache.
In the prior art, read operations of different threads on the same data involve the same reference count, i.e., the same reference count operation. This reference count read_ref is referred to as global read_ref according to the general rules of the computer arts. In particular, different threads of the same computing core, including different threads in the same process or different threads in different processes, perform a self-add (++) or self-subtract (- -) operation on the same global read_ref when performing a read operation on the same data. If in a multi-core computer system, only one global reference count is still employed for the case of multiple cores, problems as analyzed in the background art may occur.
In this step, for different cores, a private application count is set for each core. For example, for core1, its corresponding private reference count is set, e.g., as read_ref_core1, and for core2, its corresponding private reference count is also set, e.g., as read_ref_core2. For the case that other cores are also included, and so on.
The private reference count for each core may not be permanently (or fixedly) assigned, but rather temporarily assigned. For example, the private reference count may be retracted after the thread of each core has completed reading the data. Specifically, an array of private reference counts [ read_ref ] may be set. Before each core thread first locks the data, it applies to allocate one of the [ read_ref ] arrays. The space of the array [ read_ref ] may be set large enough. Each entry in the array may be set to shape (int). The initial value of each entry in the array may be initialized to 0. Of course, each entry in the [ read_ref ] array may also be fixedly allocated to each core for a read operation of a certain data.
Preferably, in practice, each entry in the [ read_ref ] array may be allocated into a cache line in the cache (CACHE LINE). The cache line is the minimum unit for maintaining cache consistency by the multi-core CPU, and is also the actual unit of memory exchange. In practice, one cache line on most platforms is larger than 8 bytes, and most cache lines are 64 bytes. If the [ read_ref ] array is defined as type int, then it is 8 bytes. It can be seen that one cache line can store 8 read_ref. If more than one read_ref is stored in a cache line, there is a conflict in manipulating different elements in the array. To avoid conflicts, each read_ref in the [ read_ref ] array may be stored in a cache line. For example, each entry in the [ read_ref ] array may be declared as a structure, while a structure size is declared as 64 bytes. In this way, each entry in the [ read_ref ] array is exclusive of one cache line, and conflicts during operation can be avoided.
S110, in the process of reading the same data by threads of different cores, locking and unlocking operations are carried out by private reference counts corresponding to different cores.
For example, the same computer system includes 2 computing cores, core1 and core2, respectively. For another example, both core1 and core2 read the same data. According to S100, core1 may apply for 1 private reference count, labeled as read_ref_core1, and similarly, core2 may apply for 1 private reference count, such as read_ref_core2, respectively.
Thus, during the process of reading the data by the thread of the core1, the read lock is first performed. That is, the private reference count read_ref_core1 of core1 performs an add 1 operation. Thus, read_ref_core1 changes from an initial value of 0 to 1. Thereafter, the thread of core1 reads the data. After the read operation is completed, the read lock operation is performed. That is, the private reference count read_ref_core1 of core1 performs a decrement 1 operation. Thus, read_ref_core1 changes from 1 to 0.
Similarly, during the process of reading the data by the thread of the core2, the read lock is first performed. That is, the private reference count read_ref_core2 of core2 performs an add 1 operation. Thus, read_ref_core2 changes from an initial value of 0 to 1. The thread of core2 then reads the data. After the read operation is completed, the read lock operation is performed. That is, the private reference count read_ref_core2 of core2 performs a decrement 1 operation. Thus, read_ref_core2 changes from 1 to 0.
By adopting the mode in the embodiment of the application, when threads of different cores read the same data, private reference counting operation corresponding to the cores is independently performed. The private reference counts corresponding to these different cores need not be synchronized between the cores, and thus execution efficiency is improved. Furthermore, the expansibility of the read lock is improved, namely, the time for adding and reading the lock is hardly increased no matter how many threads of the cores add and read the lock simultaneously.
In addition, private reference counts corresponding to different cores do not need to be synchronized among the cores, so that the communication process among the cores is omitted, and the cost of bandwidth, time and the like required by inter-core communication is omitted.
The step S110 may specifically include the following steps:
And S111, in the process of reading the data by the thread of the first core, locking and reading operations are carried out by using the private reference count corresponding to the first core.
And S112, in the process of reading the data by the thread of the second core, locking and reading operations are carried out by using the private reference count corresponding to the second core.
The reading locking operation performed by the private reference count specifically comprises that processes of different cores execute 1 adding operation by the private reference count corresponding to each core. The interpretation lock operation performed by the private reference count specifically comprises that processes of different cores execute the 1 reduction operation by the private reference count corresponding to each core. Between the locking and unlocking operations, processes of each core may read the data.
It should be noted that, in the process of performing a read operation on the same data by multiple different threads of the same core, the read locking and read locking operations may be performed by the same private counter. For example, in the process of thread 1 of core1 reading the data, read locking is first performed. The private reference count read_ref_core1 of core1 performs a 1-up operation. Thus, read_ref_core1 changes from an initial value of 0 to 1. Thereafter, thread 1 of core1 reads the data. In the process that the thread 1 of the core1 reads the data, the thread 2 of the core1 also performs a read operation on the same data, and then adds 1 to the value of read_ref_core1, and reads the data. The value of read _ ref is now 2. After the read operation of thread 1 of core1 is completed, the value of read_ref_core1 is decremented by 1, and the lock is interpreted. At this time, the value of read_ref_core1 becomes 1. After that, thread 2 of core1 finishes the read operation of the data, and subtracts 1 from the value of read_ref_core1 to interpret the lock. At this time, the value of read_ref_core1 becomes 0. Thus, for the same core, the time for adding and decoding the lock is hardly increased no matter how many threads add and decode the lock simultaneously.
It should also be noted that, in order to avoid data inconsistency, the read lock in the embodiment of the present application is still mutually exclusive from the write lock. For example, in a computer system having multiple cores, a global write lock, such as global_writer_id, is set. A thread performs a write operation on a piece of data, and then locks the piece of data before performing the write operation. For example, thread 1 of a certain core updates the value of global_writer_id to the ID of thread 1 (e.g., the data type of global_writer_id is shaped and the initial value is 0; the ID of any thread is not 0), and then writes the data. During writing, a thread of a certain core (which may be the same core or a different core than the thread that previously written the lock), referred to herein as thread 2, applies for a private reference count corresponding to that core in order to perform a read operation on the same data. The private reference count is initialized to 0, for example. However, since the value of global_writer_id is not 0 at this time, thread 2 cannot lock a read and cannot read the data. After the write operation of thread 1 is completed, the write lock is released, i.e., the value of global_writer_id is updated to 0. After thread 2 fails to lock and waits for a period of time, it knows that the value of global_writer_id is 0 at this time, and can lock. Thread 2 may also attempt to lock at regular intervals after the previous lock failure, and retry to lock successfully when the global_writer_id value is 0. Thus, thread 2 increments the value of the private reference count it applies to by 1, after which the data is read. The value of the private reference count for thread 2 at this point is 1. After the read operation of thread 2 is completed, the lock is read, i.e. the value of its corresponding private reference count is decremented by 1 to become 0.
Based on this, before the threads of the different cores perform the read locking operation on the corresponding private reference count in S110, the method may further include:
S101, the threads of different cores check whether the data are in the writing operation process, and when the check result is negative, the execution is triggered S110.
Whether or not in the process of a write operation may be accomplished by checking the state of the global write lock. For example, it may be checked whether the global write lock is 0, and S110 is performed when the check result is 0.
Conversely, if the value of the global write lock is checked to be not 0, this means that there is currently a write operation to the data. Then based on the exclusive row of the write lock and the read lock, the data cannot be locked, and thus cannot be read. In this case, it is necessary to wait for the global write lock to become 0 and then execute S110.
The step S101 may be performed after step S100 or may be performed before step S100.
It should be noted that, for the read lock, in order to further avoid a write operation performed by another thread for a write operation simultaneously between the operation of checking the global_writer_id value and the corresponding read operation, that is, in order to avoid the conflict detection failure caused in this case, after adding 1 to the value of the private reference count corresponding to the core, it is further checked again whether the global_writer_id value at this time is 0. If not 0, a read operation is performed.
One embodiment of the read lock operating system of the present application is described below. Fig. 2 shows a block diagram of an embodiment of the system.
As shown in fig. 2, the read lock operating system in an embodiment of the present application includes a first computing core 11a, a second computing core 11b, a first cache unit 12a, a second cache unit 12b, and a data unit 13, where each of the computing cores corresponds to a unique cache unit.
Wherein:
A data unit 13 for storing data;
A first cache unit 12a for holding a first private reference count allocated for a first computing core;
a second cache unit 12b for holding a second private reference count allocated for a second computing core;
A first 11a and a second 11b computing core for reading the same data in said data unit, and,
In the process of reading the data by the thread of the first computing core 11a, locking and unlocking operations are carried out by using the private reference count corresponding to the first core;
And in the process of reading the data by the thread of the second computing core 11b, locking and reading operations are carried out by using the private reference count corresponding to the second core.
Wherein:
the first cache unit 12a may be a cache of a first computing core;
The second cache element 12b may be a cache of a second computational core.
As mentioned in the foregoing method embodiment, the private reference count corresponding to each core may be set, and the private reference count corresponding to each core may be allocated before the thread of each core performs the first read locking on the data, or the private reference count of each core may also be fixedly allocated. For example, an array of private reference counts [ read_ref ] may be set. Before each core thread first locks the data, it applies to allocate one of the [ read_ref ] arrays. The space of the array [ read_ref ] may be set large enough. Each entry in the array may be set to shape (int). The initial value of each entry in the array may be initialized to 0. Of course, each entry in the [ read_ref ] array may also be fixedly allocated to each core for a read operation of a certain data. Preferably, in practice, each entry in the [ read_ref ] array may be allocated into a cache line in the cache (CACHE LINE). The cache line is the minimum unit for maintaining cache consistency by the multi-core CPU, and is also the actual unit of memory exchange. In practice, one cache line on most platforms is larger than 8 bytes, and most cache lines are 64 bytes. If the [ read_ref ] array is defined as type int, then it is 8 bytes. It can be seen that one cache line can store 8 read_ref. If more than one read_ref is stored in a cache line, there is a conflict in manipulating different elements in the array. To avoid conflicts, each read_ref in the [ read_ref ] array may be stored in a cache line. For example, each entry in the [ read_ref ] array may be declared as a structure, while a structure size is declared as 64 bytes. In this way, each entry in the [ read_ref ] array is exclusive of one cache line, and conflicts during operation can be avoided.
In combination with the foregoing, in one embodiment of the read lock operating system of the present application, caches of different cores may correspond to different cache lines. For example, the first cache unit corresponds to a first cache line and the second cache unit corresponds to a second cache line.
In the embodiment of the read lock operating system, the read lock operating system may further include a checking unit 14, configured to check whether the data is in a write operation process, and trigger a lock locking and lock reading operation performed by each computing core on the corresponding private reference count if not.
The above-mentioned read locking operation for private reference count specifically includes that the process of each core executes 1-adding operation for the private reference count corresponding to the core. The interpretation lock operation for the private reference count specifically comprises that the process of each core executes the 1 subtracting operation for the private reference count corresponding to the core. Between the locking and unlocking operations, processes of each core may read the data.
One embodiment of the write lock operation method of the present application is described below. Fig. 3 shows a flow chart of an embodiment of the method. As shown in fig. 3, an embodiment of a write lock operation method of the present application includes:
and S300, before the writing operation is carried out on the data, judging whether the reading operation process of the data exists in all the computing cores.
The determining whether all computing cores have a process of reading the data may be specifically implemented by facilitating whether the private reference count of each computing core corresponding to the data is 0. If 0, it indicates that the data is in the process of reading, and if not 0, it indicates that the data is not in the process of reading.
And S310, before the writing operation is carried out on the data, judging whether the data is in another writing operation process or not.
S310 may specifically be implemented by determining whether the global write lock for the data is 0. If 0, it indicates that the data is not in the process of another write operation, and if not 0, it indicates that another write operation is in existence.
And S320, if the judging results of the S310 and the S320 are both negative, performing operations of locking and unlocking by using the global write lock in the process of writing the data.
Specifically, the data is written with a lock before the writing operation is performed, and the data is unlocked after the writing operation.
The global variable in S320 is, for example, global_writer_id. The write lock may be to update the value of global_writer_id to the ID of the write thread, and the write lock may be to update the value of global_writer_id to 0.
Similarly, for write locking, to further avoid a read operation performed for a read operation simultaneously with another thread between the operation of checking each core private reference count value and the corresponding write operation, i.e., to avoid conflict detection failure caused in this case, after updating the value of global_writer_id to the ID of the write thread, it is also checked again whether each core private reference count value is 0 at this time. If not 0, a write operation is performed.
The write lock operation method may be based on the read lock operation method or the read lock operation system.
One embodiment of a write lock operating system of the present application is described below. Fig. 4 shows a block diagram of an embodiment of the system. As shown in fig. 4, the write lock operating system embodiment of the present application includes:
a data unit 3 for storing data;
A first judging unit 21a for judging whether or not a read operation process for the data exists in all the computing cores before performing a write operation for the data;
a second judging unit 21b for judging whether the data is in the course of another writing operation before the writing operation is performed on the data;
And the write lock adding and unlocking unit 22 is configured to perform write lock adding and unlock operations with the global write lock in the process of performing write operations on the data if the determination results of the first determination unit and the second determination unit are both negative.
Specifically, the data is written with a lock before the writing operation is performed, and the data is unlocked after the writing operation.
The global variable is, for example, global_writer_id. The write lock may be to update the value of global_writer_id to the ID of the write thread, and the write lock may be to update the value of global_writer_id to 0.
The write lock operating system may be based on the read lock operating method or the read lock operating system.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in the same piece or pieces of software and/or hardware when implementing the present application.
From the above description of embodiments, it will be apparent to those skilled in the art that the present application may be implemented in software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present application.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
The application is operational with numerous general purpose or special purpose computer system environments or configurations. Such as a personal computer, a server computer, a hand-held or portable device, a tablet device, a multiprocessor system, a microprocessor-based system, a set top box, a programmable consumer electronics, a network PC, a minicomputer, a mainframe computer, a distributed computing environment that includes any of the above systems or devices, and the like.
The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
Although the present application has been described by way of examples, one of ordinary skill in the art appreciates that there are many variations and modifications that do not depart from the spirit of the application, and it is intended that the appended claims encompass such variations and modifications as fall within the spirit of the application.