CN114203258A

CN114203258A - Single-stranded DNA screening method for regulating gene mRNA expression

Info

Publication number: CN114203258A
Application number: CN202111437868.4A
Authority: CN
Inventors: 孙曙明; 王启航; 廖永康; 刘静
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2021-11-29
Filing date: 2021-11-29
Publication date: 2022-03-18
Anticipated expiration: 2041-11-29
Also published as: CN114203258B

Abstract

The invention discloses a single-chain DNA screening method for regulating gene mRNA expression, which adopts a program to analyze complex DNA sequence information, greatly improves the efficiency and time, simultaneously adopts program comparison to analyze related information in a database, predicts the probability of combination of a sequence and a transcription factor, and screens the repetitive sequence information of an important transcription factor for regulating gene expression. Artificially synthesizing the predicted single-stranded DNA sequence, and performing cytological experimental verification. Finally obtaining single-stranded DNA for regulating gene mRNA expression. The invention is universal in the process of screening and searching the single-stranded DNA for regulating the mRNA expression of other genes, and is suitable for screening and verifying the single-stranded DNA with the regulated gene expression of various genes.

Description

Single-stranded DNA screening method for regulating gene mRNA expression

Technical Field

The invention relates to a single-stranded DNA screening method for regulating gene mRNA expression. Belongs to the technical field of genetics.

Background

In the genome of eukaryotes, many repetitive sequences exist, and play a variety of roles in expression of cellular genes and the like, including palindromic repeats, inverted repeats, mirror repeats, complementary repeats and the like.

The upstream of eukaryotic cell gene has several cis-acting elements, such as enhancer, silencer, promoter, etc. to regulate the downstream gene expression. When a eukaryotic cell transcribes a gene, the chromosome structure is loosened, a DNA double strand is separated and combined with a transcription factor, and in the process, various types of repeated sequences at the upstream of the gene can enable the DNA strand to form various spatial structures through complementary pairing, so that the combination of the DNA and the transcription factor is influenced, and the expression of the gene is regulated. The nucleic acid aptamer is a nucleic acid sequence which can be combined with a specific protein domain through forming a certain stable spatial structure, thereby playing a certain role in biological function. Through screening of specific types of repeated sequences in a genome, it is possible to find a small segment of nucleic acid sequence with good binding property with a transcription factor, thereby screening sequences with a regulation function on gene expression.

Studies have shown that single-stranded DNA can affect the expression of certain genes. But not a repetitive single-stranded DNA sequence. Currently, the existing repetitive single-stranded DNA has few studies on the function in cells, and the methods of performing computational screening using a computer programming language and performing verification in experiments are still few. The invention is a technology with original significance.

Disclosure of Invention

The present invention aims at overcoming the demerits of available technology, and provides one kind of single stranded DNA screening method for regulating gene mRNA expression.

In order to achieve the purpose, the invention adopts the following technical scheme:

a single-stranded DNA screening method for regulating gene mRNA expression comprises the following specific steps:

(1) analyzing the upstream sequence of the gene by python language to obtain all complementary repetitive sequences and reverse complementary repetitive sequence pairs, and deriving the sequence length and the sequence initial position to obtain an alternative sequence;

(2) predicting transcription factor binding sites of the alternative sequences by using a JASPAR database, screening out the transcription factors with scores of more than 9, and numbering corresponding sequences to obtain pre-screened sequences;

(3) then transfecting the cell with the pre-screened sequence, setting internal reference, inspecting the change of the mRNA expression quantity of the corresponding gene, and screening out the sequence with the statistical difference of the change quantity, namely the single-stranded DNA for regulating the mRNA expression of the gene.

Preferably, the single-stranded DNA comprises a single-stranded DNA sequence that up-regulates or down-regulates the expression of a gene mRNA.

Preferably, the trruit database can be used to analyze the transcription factors corresponding to the obtained single-stranded DNA sequences, and determine their action targets and action effects for further research.

Preferably, the specific method of step (1) is as follows: reading in sequence information and the length range of the repeated sequences, cycling through each length n, establishing a set (set) data structure, and for each sequence with the position [ i, i + n-1] (i represents the relative position of bases, and n represents the number of base increment), if its inverted repeat/complementary repeat sequence is in the set, storing it and the sequence in the set as a repeat sequence pair. And adding the sequence [ i, i + n-1] into the set every time the judgment is finished, and judging the next [ i +1, i + n ] until i is less than or equal to 5001-n. And finally, outputting the sequence, length, position and other information of all the retrieved repeated sequence pairs as a program operation result.

Preferably, in step (3), the transfected cells are K562 cells.

Preferably, in step (3), the statistical difference criterion is t < 0.05.

The invention has the beneficial effects that:

the invention adopts program to analyze complex DNA sequence information, greatly improves efficiency and time, and simultaneously adopts program comparison to analyze relevant information in a database, predicts the probability of combination of the sequence and the transcription factor, thereby screening and regulating the repetitive sequence information of the important transcription factor expressed by the gene. Artificially synthesizing the predicted single-stranded DNA sequence, and performing cytological experimental verification. Finally obtaining single-stranded DNA for regulating gene mRNA expression.

The key innovation point of the invention is that the single-stranded DNA is screened by means of standardized processes such as sequence information retrieval, program design, transcription factor binding prediction, experimental verification and the like for the first time. The screened single-stranded DNA has the capability of regulating the expression level of mRNA of a specific screened gene in cells, and the specific single-stranded DNA has the possibility of generating up-regulation or down-regulation function on the gene.

The invention is universal in the process of screening and searching the single-stranded DNA for regulating the mRNA expression of other genes, and is suitable for screening and verifying the single-stranded DNA with the regulated gene expression of various genes.

Drawings

FIG. 1 is a diagram of HBB sequence information;

FIG. 2 is a flowchart of the python algorithm;

FIG. 3 shows the expression of mRNA of HBB gene.

Detailed Description

The present invention will be further described with reference to the accompanying drawings and examples, which are provided for the purpose of illustration only and are not intended to limit the scope of the invention.

1.HBB upstream sequence query

The location of the human HBB gene was found in the NCBI database at human chromosome 11, 5227071-. The upstream 5730bp sequence is taken as the sequence of the 5232800-5227071 fragment of human chromosome 11. (FIG. 1)

2. Programming and analysis results

Using python language program to analyze upstream 5730bp sequence and obtain all pairs of complementary repeat sequence and inverted complementary repeat sequence in 5730bp sequence and derive the length and sequence start position (calculated from HBB gene start)

The specific idea is as follows: reading in sequence information and the length range of the repeated sequences, cycling through each length n, establishing a set (set) data structure, and for each sequence with the position [ i, i + n-1] (i represents the relative position of bases, and n represents the number of base increment), if its inverted repeat/complementary repeat sequence is in the set, storing it and the sequence in the set as a repeat sequence pair. And adding the sequence [ i, i + n-1] into the set every time the judgment is finished, and judging the next [ i +1, i + n ] until i is less than or equal to 5001-n. And finally, outputting the sequence, length, position and other information of all the retrieved repeated sequence pairs as a program operation result. (FIG. 2)

The results of the program analysis are shown in tables 1 and 2.

TABLE 1 analysis of complementary repeats in 5730bp upstream of HBB Gene

TABLE 2 analysis of the inverted complementary repeat sequence in 5730bp upstream of the HBB gene

3. Screening of sequences having potential transcription binding sites

And (3) predicting the transcription factor binding sites of the alternative sequences by using a JASPAR database, screening out the transcription factors with scores of more than 9, numbering the corresponding sequences to obtain 12 sequences, and obtaining the results shown in Table 3.

TABLE 3 prediction of transcription factor binding

4. RT-qPCR result chart

The results of the change in the expression level of mRNA of HBB gene after transfection of 12 sequences into K562 cells (ATCC, USA) were plotted, and the results are shown in FIG. 3 (in which A to L are the expression levels of mRNA of HBB gene of sequence Nos. 1 to 12 in this order). With the GAPDH gene (synthesized in shanghai) as an internal reference, a statistical difference criterion was set to t <0.05, and t-tests were performed on the HBB expression changes after introduction of different sequences into cells compared to the blank control group NC for the number of repetitions, and it was found that sequence nos. 1, 2, 3, 4, 5, 6, 11, and 12 did not show statistical differences. The

sequences

7, 8, 9 and 10 have significant differences and the expression level is reduced compared with that of a blank control group, wherein the

sequences

7 and 8 are significantly reduced after being introduced, which indicates that the sequences may have a greater influence on the transcription of the HBB gene.

5. Sequence-corresponding transcription factor and mechanism prediction

The TRRUST database was used to analyze the transcription factors corresponding to sequences No. 7, 8, 9 and 10, and the existing action targets, action effects and sources of the literature in the past were found for further study (Table 4).

TABLE 4 sequence downstream Effect prediction analysis

Although the embodiments of the present invention have been described with reference to the accompanying drawings, the scope of the present invention is not limited thereto, and various modifications and variations which do not require inventive efforts and which are made by those skilled in the art are within the scope of the present invention.

Claims

1. A single-stranded DNA screening method for regulating gene mRNA expression is characterized by comprising the following specific steps:

2. The screening method of claim 1, wherein the single-stranded DNA comprises a single-stranded DNA sequence that up-regulates or down-regulates gene mRNA expression.

3. The screening method according to claim 1, wherein the trruit database is further used to analyze the transcription factors corresponding to the obtained single-stranded DNA sequences, and determine their action targets and action effects for further research.

4. The screening method according to claim 1, wherein the specific method of step (1) is as follows: reading in sequence information and the length range of the repeated sequences, circulating for each length n, establishing an aggregate data structure, and storing each sequence with the position of [ i, i + n-1] and the sequence in the aggregate as a repeated sequence pair if the inverted repeat/complementary repeated sequence of the sequence is in the aggregate. And adding the sequence [ i, i + n-1] into the set every time the judgment is finished, and judging the next [ i +1, i + n ] until i is less than or equal to 5001-n. And finally, outputting the sequence, length, position and other information of all the retrieved repeated sequence pairs as a program operation result.

5. The screening method according to claim 1, wherein the transfected cells in step (3) are K562 cells.

6. The screening method according to claim 1, wherein in the step (3), the standard for the statistical difference is t < 0.05.