Joke Collection Website - Talk about mood - What are the commonly used statistical methods in medical research?

What are the commonly used statistical methods in medical research?

The cool autumn breeze also brings us the fifth issue of Professor Liu Ling’s Statistics Talk. Everyone must study carefully the selection of statistical methods in this issue, maybe you will use it soon.

Editor's Note

For the commonly used basic statistical methods, generally speaking, they are t test, one-way analysis of variance and chi-square test. This is what everyone is doing when writing papers and reading papers. We often encounter statistical methods (almost every article will involve one or more methods), so what kind of statistical method should be used? Let’s talk about this today.

1. Before getting the data and starting to analyze it, you must divide the data types (Figure 1), because different data types are described in different ways and the statistical methods are also different.

Figure 1 Types of statistical data

For example (Table 1):

Table 1 Health examinations of 735 elderly people over 65 years old in a certain place in 2002 Record

2. Statistical analysis of various types of data (description and statistical inference)

1. Measurement data

Characteristics: observed value of each observation unit There is a quantitative difference between them, and there are units;

Description form: The most common use is "X±S" (often seen in general literature), the arithmetic mean is used to describe the average level, and the standard deviation is used to describe it its degree of dispersion. If you encounter "particularly abnormal" data (especially if the standard deviation is greater than the arithmetic mean), use Md (P25, P75) (Md is the median, P25 and P75 are the quartiles) (Table 2). Please review the normal distribution test: Medical Research Classroom丨Statistics Talk (3): What you should know about normality and homogeneity of variance tests

Table 2 Characteristics and applications of commonly used statistical indicators for measurement data Occasion

Statistical inference methods: generally divided into two types: single factor and multi-factor.

The key points of single factor analysis method analysis: first, clearly define the data type (measurement data); second, clarify the type of experimental design (completely randomized design? How many groups of samples?); third, pay attention to the application conditions of the method used ; Fourth, when the homogeneity of normal variances is met, the t test (note that there are three forms of t tests!) or single-factor analysis of variance is used. When it is not met, the rank sum test is used (Figure 2).

Figure 2 The correct choice of statistical methods for measurement data

Two points to remind:

① If the sample data does not obey the normal distribution, then you can only use Non-parametric tests (rank sum tests), but their test performance is lower than parametric tests (t-test or analysis of variance). The so-called low test efficiency means that there are differences in the test, but there is no ability to detect the differences.

② If the data is from two or more groups of samples, the t test cannot be used (it will increase the probability of false positive errors), and analysis of variance should be used. If the Plt of variance analysis is 0.05, further pairwise comparison is required. The commonly used method is LSD method or SNK method (note that t test still cannot be used).

In the last two lectures, we have already learned about t-test (Medical Research Classroom丨Statistics Talk (2): Did you do your t-test correctly?) and analysis of variance (Medical Research Classroom丨Statistics Let’s talk about (4): The soul of statistical methods—analysis of variance). As for the rank sum test, we will introduce it step by step later.

Multi-factor analysis generally uses regression analysis, mainly linear regression analysis. This method will be introduced to you later.

2. Counting data

Characteristics: disordered classification, there is no quantitative difference between observation units in the same category, but there are qualitative differences between categories, and each category is different from each other. Compatible.

Among them, the two categories must be count data (for example, gender can only be divided into male/female, and whether a certain disease is secondary can only be secondary/not secondary), while multi-category satisfies the classification and there is no difference in degree in nature, that is, It is count data (for example, marital status includes single, married, divorced, and widowed, which belongs to multiple categories, but there is no degree difference between categories. Therefore, it is count data. The qualitative urine glucose test results include -, , , , , which are classified as having degrees. Multi-classified data with different levels does not belong to count data, but to hierarchical data).

Description form: The most common method is "number of examples ()" (often seen in general literature). It is mainly necessary to distinguish the difference between composition ratio (relative number of structure) and rate (relative number of intensity) (Table 3 ). Moreover, when applying, the denominator (that is, the sample size) should generally not be too small. A denominator that is too small is not enough to reflect the objective facts of the data and is unstable.

Table 3 Characteristics of commonly used statistical indicators for enumeration data and their application occasions

For example:

1. Among the lung cancer patients in a certain place, male A and female B For example, the sex ratio of local lung cancer patients is A/B, which is "ratio".

2. In a certain study, 3 pathogenic bacteria were detected. The total number of strains was A B C. Among them, the number of detected strains of one pathogenic bacteria was A. Then A/(A B C) is the composition ratio. , that is, the proportion or distribution of this type of pathogenic bacteria in the total pathogenic bacteria.

3. A study treated patients (the total number of cases was B), and the number of cured patients was A, then A/B is the rate (can be understood as the cure rate).

Statistical inference methods: generally divided into two types: single factor and multi-factor.

The key points of single factor analysis method analysis: first, clearly define the data type (count data); second, clarify the experimental design type (completely randomized design? How many groups of samples?); third, pay attention to the application conditions of the method used The fourth is the comparison of multiple sample rates. If the Plt of the chi-square test is 0.05, further pairwise comparisons and Bonferroni correction are required to control false positives (Figure 3).

Figure 3 The correct choice of statistical method for counting data

Two points to remind:

① The composition ratio is based on 100 as the base, and the proportion of each component The sum must be 100, so the increase or decrease in the proportion of a certain component will affect the proportion of other components;

② The composition ratio and rate are easy to be confused in practical applications, and the main difference is in the denominator, so The denominator should be chosen correctly.

Multi-factor analysis generally uses regression analysis, mainly logistic regression analysis. This method will be introduced to you later.

3. Hierarchical data

Characteristics: It belongs to multi-category data and satisfies the differences in nature and level between multiple categories. The attributes of each category are arranged in a certain order (ordered). That is the level data.

Description form: The most common use is "number of examples ()" (often seen in general literature), which is roughly the same as the description of counting data. The main difference is that multiple categories must be arranged in order. (from small to large or from weak to strong).

Statistical inference method: The statistical analysis method of hierarchical data uses non-parametric test (rank sum test) in single factor analysis. Of course, for two-way ordered R×C data, that is, grouping variables and outcome variables They are all ordered (ranked) situations. The chi-square test is used to compare composition ratios, the rank sum test is used to compare degrees, and rank correlation (also called rank correlation) is used to compare trend correlations. Ordinal logistic regression was used in multivariate analysis.

Note: Categorical variables (count data and grade data) must be appropriately quantified (assigned) during software analysis operations. The assignment situation will directly affect the interpretation of the statistical analysis results.

Finally, the following figure is used to summarize the selection of basic statistical methods (Figure 4).

Figure 4 The correct choice of commonly used basic statistical methods

That’s it for today. Students, please review it. If you have any questions or don’t understand, you can leave a message below and we will Please ask Professor Liu Ling to answer them one by one.

Well, let’s look forward to the next issue!

Writer: Liu Ling Editor: Liu Qin

Typesetting: Bi Li Review: Wang Dong

Expert profile

Liu Ling: Associate professor in the Health Statistics Teaching and Research Section of Army Medical University, mainly engaged in health statistics teaching and scientific research. He serves as a member of the 8th Statistical Theory and Methods Professional Committee of the Chinese Society of Health Information, deputy director of the Chongqing Preventive Medicine Health Statistics Professional Committee, and serves as an editorial board member and statistical review expert for many magazines such as "Journal of the Third Military Medical University".

Historical recommendations

Medical Research Classroom丨Statistics Talk (4): The soul of statistical methods—variance analysis

Medical Research Classroom丨Statistics Talk (4) 3): What you should know about normality and homogeneity of variance tests

Medical Research Classroom丨Statistics Talk (2): Did you do the t test correctly?

Medical Research Classroom丨Statistics Talk (1): What is sample size estimation?