Joke Collection Website - Cold jokes - Drawing PCA map from transcriptome data and related problems of biological repetition

Drawing PCA map from transcriptome data and related problems of biological repetition

I haven't been with you for a long time ~ ~ ~ busy with the fund. . Fund. . Or a fund, and then an article. . Articles. . Still an article.

Finally, I have time to sort out the past.

Before we talk about the mapping method, we should understand what is the significance of the transcription PCA map.

So as to detect the degree of dispersion between samples, that is, the difference between repetitions.

1. Before drawing, answer the question of sample copying.

Transcriptome sequencing usually requires three replicates. But for people who have never been exposed to sequencing at all, they are puzzled: why must transcriptome sequencing be biologically repeated? I don't want it, okay? Why do most people need to repeat it three times? Can you repeat 4,5,666? What is repetition? Three mice count as three repetitions, or 1 mouse counts as three repetitions? A bunch of questions are very tangled, which really makes people head ~ ~ ~

The first question: Does biology have to be repeated?

Answer: the best.

Under what circumstances can there be no biological reproduction?

1) The research funds are too small to be sequenced. (in this case, it is simply unexpected. Test 1 is chicken ribs. )

2) The experimental evidence is absolutely sufficient, and then it looks fancy to decorate the facade. If the experiment is so good, then test a few more ~ or not at all. Otherwise, you could have sent nature, but you could only send plosone, which is unnecessary. )

Second question: Do we have to repeat it three times? Can I take two or four exams?

Answer: The number of repetitions must be ≥3 times.

1) What is the purpose of setting duplicates first? The purpose is to eliminate intra-group errors; Improve the reliability of the results; Detect abnormal values.

1. 1) If mice are given a drug, different mice will definitely have different reactions to the drug, then multiple samples can eliminate the differences between mice.

1.2) suppose you give drugs to three mice, but one of them is born with strong immunity, and the drugs have little effect on it. The other two are similar, and the one with strong immunity should be deleted when analyzing later, because its data will cause great deviation to the analysis results.

1.3) However, if you only have two mice, and one of them is born with strong immunity, drugs have little effect on it. When I got the sequencing data, I found that there was a big difference between them. Which one do you choose? Some people say that I will definitely choose the one with normal immunity. Oh, this question is really. . . Only after sequencing can we know whether the immunity is strong or not. You don't know whether the mice are healthy or not until you give them medicine. So you shouldn't choose two.

2) In theory, the more repetitions, the better, but considering the actual situation, setting three repetitions is a common method.

See the following documents for specific reasons: Study on differential expression of RNA-seq: More sequences or more replications?

3) The sample difference between animals or plants is still relatively large, and more measurements can be made, for example, 5- 10 repetitions can be made. If you are a local tyrant, you can measure any number you think is lucky, such as 66,88,996 or even 2333. (joking)

The third question: Is the sequencing of three mice repeated, or is it repeated three times per mouse/kloc-0?

Answer: Three mice are tested once each.

Understand biological replication and technical replication. (Baidu comes with it)

2. Draw a PCA diagram.

Load drawing package

Set the running path and import the previously calculated FPKM data.

Calculate the index of each principal component analysis.

Drawing PC with ggscatter

Or you can try to draw a 3D scatter plot.

The bad thing about 3d drawing is that there are no parameters for you to display the name of each point in the scatter plot 3D, so it is very depressing.

If you want to achieve it, try the following methods. I'm from Google, too