[go: up one dir, main page]

CN113452383A - GPU parallel optimization method for TPC decoding of software radio system - Google Patents

GPU parallel optimization method for TPC decoding of software radio system Download PDF

Info

Publication number
CN113452383A
CN113452383A CN202010225156.5A CN202010225156A CN113452383A CN 113452383 A CN113452383 A CN 113452383A CN 202010225156 A CN202010225156 A CN 202010225156A CN 113452383 A CN113452383 A CN 113452383A
Authority
CN
China
Prior art keywords
data
decoding
thread
column
information bit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010225156.5A
Other languages
Chinese (zh)
Inventor
习勇
陈海赞
黄铁
肖辉明
王欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Leading Wisdom Telecommunication and Technology Co Ltd
Original Assignee
Hunan Leading Wisdom Telecommunication and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Leading Wisdom Telecommunication and Technology Co Ltd filed Critical Hunan Leading Wisdom Telecommunication and Technology Co Ltd
Priority to CN202010225156.5A priority Critical patent/CN113452383A/en
Publication of CN113452383A publication Critical patent/CN113452383A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/29Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes combining two or more codes or code structures, e.g. product codes, generalised product codes, concatenated codes, inner and outer codes
    • H03M13/2957Turbo codes and decoding
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/65Purpose and implementation aspects
    • H03M13/6561Parallelized implementations

Landscapes

  • Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Error Detection And Correction (AREA)

Abstract

The invention belongs to the technical field of software radio and the field of high-performance computing of computers, and discloses a GPU parallel optimization method for TPC decoding of a software radio system, which comprises the following steps: firstly, initializing, and copying input information bit block data from a CPU memory to a GPU global storage space; secondly, decoding all the information bit block data; thirdly, performing column decoding on the TPC data block after the second-step row decoding; fourthly, repeating the second step to the third step for K times; and fifthly, calculating a judgment code word according to the TPC data block after the Kth decoding, and outputting the judgment code word to a CPU (Central processing Unit) end as a decoding result. The invention utilizes the calculation capability of the GPU, can greatly improve the TPC decoding speed in a software radio system, and meets the requirement of real-time decoding of multi-channel signals.

Description

GPU parallel optimization method for TPC decoding of software radio system
Technical Field
The invention relates to the technical field of software radio and the field of high-performance computing of computers, in particular to a GPU parallel optimization method for TPC decoding of a software radio system.
Background
TPC decoding is a key step in TDMA satellite signal demodulation, and its decoding performance affects the performance of the entire demodulation system. The Turbo Product Code (TPC) is widely applied to a software radio communication system as a high code rate code, but most TPC decoding algorithms have the problems of complex structure, high resource requirement and large data processing delay. For example, TPC requires row-column iterative decoding of a data block, and both the number of iterations and the efficiency of row-column coding affect the performance of the decoder. The TPC decoding realized based on the CPU at present is difficult to achieve the effect of real-time decoding due to the adoption of a serial execution mode, the TPC decoding is often the most time-consuming part in the whole satellite signal demodulation system, and the low-efficiency realization mode limits the wide application of the TPC decoding in the satellite signal demodulation system. It is important to perform accelerated optimization of TPC decoding.
Currently, in order to improve the decoding efficiency of the TPC, there are some related works, such as coarse-grained parallel optimization based on a multi-core CPU, hardware acceleration implementation based on an FPGA, and the like. The coarse-grained parallel optimization based on the CPU can obtain certain performance gain, but has the problem of insufficient calculation power when the sampling rate is higher. The FPGA-based implementation mode has a long programming period and insufficient flexibility. Compared with the traditional CPU, the GPU can provide more enhanced computing power and higher-speed memory access bandwidth. In addition, due to the appearance of high-level programming languages such as CUDA (compute unified device architecture), the programmability of the GPU is greatly enhanced. At present, the GPU is widely applied in the fields of science and engineering technology, and the effect is very obvious. Recent successful applications in deep learning have made the use of GPUs more popular. The TPC decoding algorithm has good parallelism essentially and is matched with the architecture characteristics of a GPU, so that the TPC decoding is accelerated by the GPU parallel optimization method aiming at the TPC decoding of the software radio system, and the method is very urgent and practical.
Disclosure of Invention
In order to solve the technical problems, the invention provides a GPU parallel optimization method aiming at TPC decoding of a software radio system, which is used for solving the problems of large time overhead and insufficient flexibility of the traditional TPC decoding implementation mode, and the specific technical scheme is as follows:
a GPU parallel optimization method aiming at TPC decoding of a software radio system comprises the following steps:
(S1) initializing, copying the input information bit block data from the CPU memory to the GPU global storage space, setting a corresponding output result storage space according to the size of the input information bit block data, and setting the initial value of the output result storage space to be 0;
(S2) decoding all the information bit block data;
(S3) column-decoding the row-decoded information bit block data;
(S4) repeating the steps (S2) to (S3) K times, wherein K is the number of times of iterative processing, and the value of K is an integer which is greater than or equal to 6.
(S5) a decision codeword is calculated from the K-th decoded TPC data block and output to the CPU as a decoding result.
Further, the step (S1) specifically includes the steps of:
(S11) copying the information bit block data to be processed from the CPU memory space to the GPU global memory space using the cudaMemcpy function;
(S12) initializing a GPU global memory space for storing the decoding result using a cudaMemset function, with an initial value of 0.
Further, the step (S2) specifically includes the steps of:
(S21) solving the position of p data with the minimum absolute value of each line of data of all information bit block data by adopting a thread block and thread two-stage parallel method, wherein p is a preset threshold value and is a positive integer, and p is set to be 4 in the embodiment of the technical scheme of the invention; in each row calculation of information bit block data, use i0,i1,…,ip-1The positions of the p data respectively representing the minimum value in the row;
(S22) equally dividing the threads in the thread block into 2pGroups, each group of threads calculating a test sequence of line data, lines within each thread blockProgram parallel computing of row data 2pThe test sequence of each row is calculated with the corresponding input row data, and the Euclidean distance of each test sequence of each row is solved, namely 2 is obtainedpEuclidean distance, 2 corresponding to the rowpThe Euclidean distances form a data set A; the calculation of the line test sequence and the calculation of the Euclidean distance from the input line data all adopt the existing line decoding technology.
The calculation process of the sequencing sequence is as follows: carrying out symbol decision on a row of the information bit block data which is correspondingly input to obtain a sequence L, i of the sequence L0,i1,…,ip-1Taking 0 and 1 at the position respectively, and keeping the values of other positions unchanged to form 2pA row test sequence; for example: first line test sequence i0The position is 0, and the values of the other positions are unchanged; second row test sequence i0The position is 1, the values of the other positions are unchanged, and the rest of the positions are analogized in sequence.
The rules for symbol decision are: for the element value of the position, if the value is less than 0, the value is 0, otherwise the value is 1; (S23) the first thread of each thread block solving for the 2' S corresponding to the rowpThe minimum value in the Euclidean distance (data set A) is obtained, and then all threads in the thread block solve the row updating amount of the corresponding row of data in parallel according to the minimum value; the row update amount is: taking the row test sequence with the minimum Euclidean distance as an optimal code word, and simultaneously calculating the off-line information quantity of each position in the corresponding row by all threads;
(S24) updating the data corresponding to all the lines in all the information bit blocks in parallel, adding the off-line information quantity and the elements at the corresponding positions of the input line data, and finishing a line decoding process.
Further, the step (S3) specifically includes the steps of:
(S31) solving the positions of the p data with the minimum absolute value of each column of data of all the information bit blocks by adopting a thread block and thread two-stage parallel method; for each column in the calculation process, use j0,j1,…,jp-1The positions of the p data respectively representing the minimum value in the column;
(S32) willThe threads within a thread block are equally divided into 2pEach group of threads respectively calculates a column test sequence of column data, and the threads in each thread block parallelly calculate 2 of the column datapEach column test sequence is calculated with the corresponding input column data, and the Euclidean distance of each column test sequence is solved, namely 2 is obtainedpEuclidean distance, 2 corresponding to the columnpThe Euclidean distances form a data set B;
the calculation process of the column test sequence is as follows: performing symbol decision on a column of the information bit block data corresponding to the input to obtain a sequence M, wherein j of the sequence M is0,j1,…,jp-1(ii) a Taking 0 and 1 at the position respectively, and keeping the values of other positions unchanged to form 2pA column test sequence; the rules for symbol decision are: for the element value of the position, if the value is less than 0, the value is 0, otherwise the value is 1;
(S33) the first thread of each thread block solving the minimum value of all euclidean distances (data set B), and then all threads in the thread block solving the column update amount of the corresponding column data in parallel according to the minimum value; the column update amount is: taking the column test sequence with the minimum Euclidean distance as the optimal code word, simultaneously calculating the off-column information quantity of each position in the corresponding column by all threads,
(S34) updating the data corresponding to all columns in all information bit blocks in parallel, adding the off-column information quantity and the elements at the corresponding positions of the input column data, and finishing a column decoding process.
The calculation process of the off-line information amount and the off-column information amount is realized according to the calculation rule of the off-line information amount in the Chase algorithm in the prior art.
Further, the step (S5) includes:
and each thread reads the data of the corresponding position in the corresponding TPC data block, if the data is less than 0, the judgment code word is calculated to be 1, namely the decoding result is 1, otherwise, the judgment code word is 0, the decoding result is 0, and each thread writes the decoding result of the corresponding position into the output data storage space.
Further, it is characterized byIn the step (S21), the number of thread blocks is equal to the number of information bit blocks to be processed multiplied by the number of rows of each information bit block data, and the number of threads in a thread block is the length of one row of data in the information bit block data multiplied by 2p
Further, in the step (S24), the parallelism of the parallel update is: number of blocks of information bits per data block size.
Further, the value of K is an integer larger than or equal to 6.
The GPU kernel executed in the step (S2) and the step (S3) both use the shared memory to store data of corresponding rows or columns and intermediate data such as euclidean distance, so as to reduce the number of accesses to the GPU global memory as much as possible and improve the efficiency of the program.
In the step (S22), the euclidean distances of all test sequences are calculated, so that it is possible to avoid determining whether the test sequences are the same in the GPU segment, and to lag the solution of the minimum value among all the euclidean distances to the step (S23).
Compared with the prior art, the invention has the following advantages and beneficial effects: through GPU parallel optimization of TPC decoding calculation, all data are processed simultaneously by utilizing tens of thousands of threads, and original time-consuming parts are parallelized to achieve the effect of quick decoding; subdividing the row-column decoding, and selecting proper parallelism to carry out fine-grained parallel optimization according to the parallel characteristics and data dependency of different stages; the shared memory is used for caching the data to be processed and the intermediate data of each thread block, so that the aim of reducing the access times of the global memory is fulfilled, and the access cost is reduced.
Drawings
FIG. 1 is a general flow diagram of the present invention;
fig. 2 is a schematic structural diagram of a GPU parallel optimization method for TPC decoding of a software radio system according to the present invention.
Detailed Description
In order to better understand the technical solution of the present application, the present application will be described in detail with reference to the drawings and the detailed description in the embodiments of the present application.
Referring to fig. 1, it is a schematic general flow chart of a GPU parallel optimization method for TPC decoding of a software radio system according to an embodiment of the present invention. The specific process is as follows:
firstly, initializing problem function parameters, and copying input information bit block data from a CPU memory to a GPU global storage space;
secondly, decoding all the information bit block data; the method comprises the steps that line decoding is carried out on line processing GPU kernel functions, each thread block in the line processing GPU kernel functions decodes a line of data of information bit block data according to a CHASE algorithm, and the number of the thread blocks of the line processing GPU kernel functions is the number of the information bit block data to be processed multiplied by the number of lines of each information bit block data packet;
thirdly, performing column decoding on the TPC data block after the second-step row decoding; performing column decoding on a column processing GPU kernel, wherein each thread block in the column processing GPU kernel performs column decoding on a column of data of the information bit block data after row decoding according to a CHASE algorithm, and the number of the thread blocks of the column processing GPU kernel is the number of the information bit block data to be processed multiplied by the number of columns of each information bit block data packet;
fourthly, repeating the second step to the third step for K times;
and fifthly, calculating a judgment code word according to the TPC data block after the Kth decoding, and outputting the judgment code word to a CPU (Central processing Unit) end as a decoding result.
The following describes the specific technical scheme of the embodiment of the invention in detail as follows:
firstly, initializing parameters, and copying input information bit block data from a CPU memory to a GPU global storage space. And initializing an output data storage space out according to the number N of input information bit blocks and the size N m of each information bit block data, wherein the initial value is 0, N represents the number of columns, and m represents the number of rows.
And secondly, decoding all the information bit block data. The line processing process executes line decoding GPU-kernel, the parallel structure of the line processing GPU-kernel is shown in figure 2, and three levels of parallel can be realizedFirstly, different data blocks (N) can be processed in parallel, secondly, all the line (m) decoding in each data block can be processed in parallel, and finally, 2 in each line decoding processpThe computation of the individual test sequences and euclidean distances may be performed in parallel. The method for configuring the kernel function of the line processing GPU comprises the following steps: the number of thread blocks of the kernel function is represented by a dim3 type variable grid1, which has a value of (N × m, 1, 1), indicating that each thread block processes one line of a data block, there are N data blocks in total, and the number of lines of each block is m. The thread block size of the kernel function is set to n x 2pIndicating that each thread processes an element of the line of data. Inside the kernel function, firstly, one warp (32 threads) is used for sorting one line of data and the position of each data according to the ascending rule of the absolute value of the data (warp represents a thread bundle, and one warp consists of 32 threads), and the p data with the minimum absolute value and the position (the first p data after sorting) of the p data in the input line data are found according to the sorting result. Then, n x 2 is addedpThread division into 2pAnd each group solves a test sequence, and then calculates the Euler distance between the obtained test sequence and the input row data. Compared with the CPU serial program which firstly compares the test sequences and then solves the Euclidean distances corresponding to different test sequences, the embodiment of the invention adopts a space time-changing strategy, does not compare the test sequences, but calculates the Euler distances of all the test sequences, converts the comparison operation into summation calculation, and improves the efficiency of the GPU. Finally, taking the test sequence with the minimum Euclidean distance as the optimal code word, simultaneously calculating the external information quantity of each position in the corresponding line by all threads, adding the external information quantity and the elements on the corresponding positions of the input line data to obtain updated line data, namely completing a line decoding process, wherein s is 2 in figure 2p
And thirdly, performing column decoding on the information bit block data subjected to the row decoding. The column decoding process executes column decoding GPU-kernels, and the kernels are configured according to the number of input data blocks and the size of each data block. The number of thread blocks of the kernel function is represented by a dim3 type variable grid2, which has a value of (N × N, 1, 1) indicating that each thread block processes dataOne column of blocks, there are a total of N data blocks, and the number of columns per block is N. Thread block size setting for column decode GPU-kernels to m x 2pIndicating that each thread processes an element of the column of data. Inside the kernel function, firstly, one warp (32 threads) is used for sorting a column of data according to an ascending rule, and p data with the minimum absolute value and the positions of the p data in the input column of data are found according to a sorting result. Then, m.sup.2 was usedpParallel generation of threads 2pAnd calculating the test sequences and the input column data to obtain the Euler distance corresponding to each test sequence. And finally, taking the test sequence with the minimum Euclidean distance as an optimal encoding word, simultaneously calculating the external information quantity of each position in the corresponding column by all threads, and adding the external information quantity and elements on the corresponding positions of the input column data to finish one-time column decoding. The workflow of each thread is described in the literature references (Davidhaze. acids of Algorithms for Decoding Block Codes with Channel measurement Information. IEEEtransactions On Information Theory, Vol. IT-18, No.1, January 1972, pp170-179)
And fourthly, repeating the second step to the third step K times, wherein K is the number of times of iterative processing and is set to be not less than 6.
And fifthly, calculating a judgment code word according to the TPC data block after the Kth decoding, and outputting the judgment code word to a CPU (Central processing Unit) end as a decoding result. The process executes the decision code to compute the GPU kernel, the parallelism of which is determined according to the output data block size k x l. In the kernel function configuration process, the number of thread blocks is set to be N × k, and the size of each thread block can be set to be l. Where N represents the number of processed data blocks, k is the length of each output data block, and l is the height of each output data block. And each thread in the GPU-kernel function of the decision code calculation judges the information bit block data after K row-column decoding iterations according to decision code calculation logic described in the CHASE algorithm, wherein the number of the thread blocks of the GPU-kernel function of the decision code calculation is the number of the information bit block data to be judged multiplied by the number of the lines of the information bit block after judgment.
And inside the kernel function, each thread reads data at a corresponding position in a corresponding TPC data block, if the data is less than 0, the decoding result is 1, otherwise, the decoding result is 0. Each thread writes the decoded result of the corresponding location into the output data storage space out. And finally copying the output data to a storage space corresponding to the CPU end through the cudaMemcpy.
In summary, the GPU parallel optimization method for TPC decoding of a software radio system according to the embodiment of the present invention has the advantages that: parallel acceleration is carried out on TPC decoding by constructing a multistage parallel GPU kernel function, and high-throughput decoding is realized by effectively utilizing the strong computing power of a GPU; and the access overhead of GPU-side data is effectively reduced by adopting a shared memory, and the time overhead of TPC decoding is further reduced.
The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.

Claims (8)

1. A GPU parallel optimization method aiming at TPC decoding of a software radio system is characterized by comprising the following steps:
(S1) initializing, copying the input information bit block data from the CPU memory to the GPU global storage space, setting a corresponding output result storage space according to the size of the input information bit block data, and setting the initial value of the output result storage space to be 0;
(S2) decoding all the information bit block data;
(S3) column-decoding the row-decoded information bit block data;
(S4) repeating the steps (S2) to (S3) K times, K being the number of iterations;
(S5) according to the decoded TPC data block of the Kth time, a judgment code word is calculated and output to the CPU as a decoding result.
2. The method for GPU parallel optimization for TPC decoding for a software radio system of claim 1, wherein the step (S1) specifically comprises the steps of:
(S11) copying the information bit block data to be processed from the CPU memory space to the GPU global memory space using the cudaMemcpy function;
(S12) initializing a GPU global memory space for storing the decoding result using a cudaMemset function, with an initial value of 0.
3. The method for GPU parallel optimization for TPC decoding for a software radio system of claim 1, wherein the step (S2) specifically comprises the steps of:
(S21) solving the position of p data with the minimum absolute value of each line of data of all information bit block data by adopting a thread block and thread two-stage parallel method;
(S22) thread parallel computation 2 within each thread blockpThe test sequence of each row is calculated with the corresponding input row data, and the Euclidean distance of each test sequence of each row is solved, namely 2 is obtainedp(ii) a euclidean distance;
(S23) solving for 2 for the first thread of each thread blockpThe minimum value in the Euclidean distance is obtained, and then all threads in the thread block solve the row updating amount of the corresponding row of data in parallel according to the minimum value; the row update amount is: taking the row test sequence with the minimum Euclidean distance as an optimal code word, and simultaneously calculating the off-line information quantity of each position in the corresponding row by all threads;
(S24) updating the data corresponding to all the lines in all the information bit blocks in parallel, adding the off-line information quantity and the elements at the corresponding positions of the input line data, and finishing a line decoding process.
4. The method for GPU parallel optimization for TPC decoding for a software radio system of claim 1, wherein the step (S3) specifically comprises the steps of:
(S31) solving the positions of the p data with the minimum absolute value of each column of data of all the information bit blocks by adopting a thread block and thread two-stage parallel method;
(S32) Each wireThread parallel computation within a program block 2pEach column test sequence is calculated with the corresponding input column data, and the Euclidean distance of each column test sequence is solved, namely 2 is obtainedp(ii) a euclidean distance;
(S33) solving for 2 for the first thread of each thread blockpThe minimum value in the Euclidean distance is obtained, and then all threads in the thread block solve the column updating quantity of the corresponding column data in parallel according to the minimum value; the update amount is: taking the column test sequence with the minimum Euclidean distance as the optimal code word, simultaneously calculating the off-column information quantity of each position in the corresponding column by all threads,
(S34) updating the data corresponding to all columns in all information bit blocks in parallel, adding the off-column information quantity and the elements at the corresponding positions of the input column data, and finishing a column decoding process.
5. The method for GPU parallel optimization for TPC decoding for a software radio system as claimed in claim 1, wherein the step (S5) is specifically performed by:
and each thread reads the data of the corresponding position in the corresponding TPC data block, if the data is less than 0, the decoding result is 1, otherwise, the decoding result is 0, and each thread writes the decoding result of the corresponding position into the output data storage space.
6. The method for GPU parallel optimization for TPC decoding for software radio systems of claim 3 wherein in step (S21), the number of thread blocks is equal to the number of information bit blocks that need to be processed multiplied by the number of rows per information bit block data, the number of threads in a thread block being the length of one row of data in the information bit block data.
7. The method of GPU parallel optimization for TPC decoding for a software radio system of claim 3, wherein: in the step (S24), the parallelism of the parallel update is the number of information bit blocks per data block.
8. The method for GPU parallel optimization for TPC decoding for a software radio system of claim 1, where K is an integer greater than or equal to 6.
CN202010225156.5A 2020-03-26 2020-03-26 GPU parallel optimization method for TPC decoding of software radio system Pending CN113452383A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010225156.5A CN113452383A (en) 2020-03-26 2020-03-26 GPU parallel optimization method for TPC decoding of software radio system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010225156.5A CN113452383A (en) 2020-03-26 2020-03-26 GPU parallel optimization method for TPC decoding of software radio system

Publications (1)

Publication Number Publication Date
CN113452383A true CN113452383A (en) 2021-09-28

Family

ID=77807602

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010225156.5A Pending CN113452383A (en) 2020-03-26 2020-03-26 GPU parallel optimization method for TPC decoding of software radio system

Country Status (1)

Country Link
CN (1) CN113452383A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117785129A (en) * 2024-02-23 2024-03-29 蓝象智联(杭州)科技有限公司 Montgomery modular multiplication operation method based on GPU
CN119727742A (en) * 2025-02-26 2025-03-28 北京科技大学 A decoding method and system for Turbo product codes

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104662515A (en) * 2012-05-24 2015-05-27 罗杰.史密斯 A computer system that can be dynamically constructed
CN107682019A (en) * 2017-09-28 2018-02-09 成都傅立叶电子科技有限公司 A kind of TPC high speed decoding methods
US20180113501A1 (en) * 2016-10-21 2018-04-26 Semiconductor Energy Laboratory Co., Ltd. Display device, electronic device, and operation method thereof
WO2019018026A1 (en) * 2017-07-20 2019-01-24 University Of South Florida Gpu-based data join

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104662515A (en) * 2012-05-24 2015-05-27 罗杰.史密斯 A computer system that can be dynamically constructed
US20180113501A1 (en) * 2016-10-21 2018-04-26 Semiconductor Energy Laboratory Co., Ltd. Display device, electronic device, and operation method thereof
WO2019018026A1 (en) * 2017-07-20 2019-01-24 University Of South Florida Gpu-based data join
CN107682019A (en) * 2017-09-28 2018-02-09 成都傅立叶电子科技有限公司 A kind of TPC high speed decoding methods

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
J. CHO AND W. SUNG: "《High-throughput decoding of block turbo codes on graphics processing units》", 《 2017 IEEE INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS)》 *
X. ZHOU AND R. LI: "《A Parallel Turbo Product Codes Decoder Based on Graphics Processing Units》", 《2019 IEEE 21ST INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS; IEEE 17TH INTERNATIONAL CONFERENCE ON SMART CITY; IEEE 5TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS)》 *
金松坡: "《基于GPU的TPC译码技术研究》", 《中国集成电路》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117785129A (en) * 2024-02-23 2024-03-29 蓝象智联(杭州)科技有限公司 Montgomery modular multiplication operation method based on GPU
CN117785129B (en) * 2024-02-23 2024-05-07 蓝象智联(杭州)科技有限公司 Montgomery modular multiplication operation method based on GPU
CN119727742A (en) * 2025-02-26 2025-03-28 北京科技大学 A decoding method and system for Turbo product codes

Similar Documents

Publication Publication Date Title
CN105049061B (en) Based on the higher-dimension base stage code decoder and polarization code coding method calculated in advance
CN107633298B (en) Hardware architecture of recurrent neural network accelerator based on model compression
EP0926836B1 (en) Viterbi decoding apparatus and viterbi decoding method
CN110033089B (en) Parameter optimization method and system of deep neural network for handwritten digital image recognition based on distributed estimation algorithm
CN110233628B (en) Adaptive Belief Propagation List Decoding Method for Polar Codes
CN113452383A (en) GPU parallel optimization method for TPC decoding of software radio system
CN117375636B (en) Method, device and equipment for improving throughput rate of QC-LDPC decoder
CN115664899B (en) A channel decoding method and system based on graph neural network
US7185268B2 (en) Memory system and method for use in trellis-based decoding
CN112307421A (en) Base 4 frequency extraction fast Fourier transform processor
US20050157823A1 (en) Technique for improving viterbi decoder performance
CN108449091B (en) A Polar Code Belief Propagation Decoding Method and Decoder Based on Approximate Computation
JP4320418B2 (en) Decoding device and receiving device
JPH10117149A (en) Trace back device/method for viterbi decoder
CN103166648B (en) A kind of LDPC decoder and its implementation
CN103401650B (en) A kind of (n, 1, m) there is the blind-identification method of error code convolutional code
US6792570B2 (en) Viterbi decoder with high speed processing function
CN111313912B (en) LDPC code encoder and encoding method
CN113285725A (en) QC-LDPC encoding method and encoder
CN113014271A (en) Polarization code BP decoding method for reducing turnover set
CN112953552A (en) Q-learning assisted successive cancellation rollover decoder and decoding method thereof
CN108566210B (en) LDPC coding system and method compatible with IEEE 802.11n standard, LDPC encoder
CN1387374A (en) Universal convolution encoder and viterbi decoder
Qi et al. Implementation of accelerated BCH decoders on GPU
CN115694513A (en) Ultra-high throughput rate LDPC decoder based on shift-type base graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210928

RJ01 Rejection of invention patent application after publication