Assessing the accuracy and efficiency of Chat GPT-4 Omni (GPT-4o) in biomedical statistics: Comparative study with traditional tools

Anusha S. Meo; Narmeen Shaikh; Sultan A. Meo

doi:10.15537/smj.2024.45.12.20240454

Article Figures & Data

Figures

Tables

Download figure
Open in new tab
Download powerpoint
Figure 1
- Large dataset file upload and command entry into Chat GPT Omni. ANOVA: Analysis of variance
Download figure
Open in new tab
Download powerpoint
Figure 2
- Comparison of large dataset-based graphs generated by Chat GPT-Omni versus Statistical Package for Social Sciences.

Tables

Figures

View popup

Table 1

- Summary of type of statistical tests used for data analysis.

Descriptive Analysis

Frequency, percentage, mean, median, mode, range, standard deviation, and skewness of the data.

Bivariate analysis

a. T-test or Mann–Whitney U test depending on variables in the dataset, between a continuous and a dichotomous categorical variable.
b. ANOVA or Kruskal–Walli’s test depending on variables in the dataset. In the case of ANOVA, a Post HOC ‘Scheffe/Tukey HSD’ test was also done for intergroup comparison of means.
c. Chi-square test was done between two categorical variables.
d. Correlation analysis: Spearmen or Pearson tests depending on variables in the dataset.
e. Simple Linear Regression was done to find a linear relationship between two continuous variables.

Multivariate analysis

Multiple linear regression if the defined outcome was a continuous variable.

ANOVA: Analysis of variance, HSD: Honestly significant difference

View popup

Table 2

- The script of the nine questions used to describe various statistical tool.

Q1	Re-code the string data into numeric values by allotting each group a number
Q2	Transform the continuous age variable into a categorical age variable.
Q3	Find mode, median and range.
Q3b	Calculate mean and standard deviation for continuous variables and frequency and per cent for categorical variables.
Q3c	Calculate the skewness of the data.
Q4a	What is the most accurate statistical test for the variables (continuous variable) and (continuous variable)? Perform it and give its coefficient and p-value.
Q4b	For the above continuous variables, perform simple linear regression analysis if applicable with y as dependant and x as an independent variable.
Q5	What is the most accurate statistical test for the variables (continuous variable) and (dichotomous categorical variable)? Perform it and give its coefficient and p-value.
Q6	What is the most accurate statistical test for the variables (continuous var) & (categorical variable > 2 levels)? Perform it and give its coefficient and p-value. Do a post hoc test for the ANOVA test.
Q7	What is the most accurate statistical test for the variables (categorical variable) & (categorical variable)? Perform it and give its coefficient and p-value.
Q8	Perform multiple linear regression/Logistic regression with (y) as the dependent variable (based on the dataset).
Q9	Make appropriate charts: continuous vs continuous variable plot, chart of normality for a continuous variable and cluster bar chart of frequency for categorical variable

View popup

Table 3

- T=Performance of ChatGPT Omni tool and recorded response time

S #	Question	Small dataset		Medium dataset		Large dataset		Total score
S #	Question	Score	Time (s)	Score	Time (s)	Score	Time (s)	Total score
1	Re-code the string data into numeric values by allotting each group a number (1)	1/1	17.70	0/1	20.33	1/1	29.43	2/3
2	Transform continuous age variable into categorical age variable (1)	1/1	15.38	1/1	23.88	1/1	11.00	3/3
3a	Find mode, median and range (3)	3/3	44.11	3/3	40.61	3/3	37.02	23/24
3b	Calculate the mean and standard deviation for continuous variables (2)	2/2	33.86	2/2	19.81	2/2	22.11
3b	Calculate frequency and per cent for categorical variables (2)	1.5/2	47.35	2/2	30.02	2/2	13.16
3c	Calculate the skewness of the data (1)	0.5/1	27.23	1/1	34.23	1/1	20.03
4a	What statistical test will be performed between (cont. variable) and (cont. variable)?	--	34.96	--	26.34	--	31.53	11/13
	Choice of Test (1)	0/1	34.96	0/1	26.34	1/1
	Calculation of correct test statistic and p-value (2)	2/2	34.53	2/2	34.03	2/2
4b	For the above continuous variables, perform a simple linear regression analysis, with y as the dependent and x as the independent variable (4). R² variance. A p-value of the regression Intercept (constant) Slope	N/A^{^*}	N/A^{^*}	N/A^{^*}	N/A^{^*}	4/4	143.59
5	What statistical test will be performed between cont. variable and dichotomous cat variable		62.91	--	67.51	--	108.71	8/9
	Choice of Test (1)	1/1		1/1		0/1	108.71
	Calculation of correct test statistic and p-value (2)	2/2		2/2		2/2	117.14
6	What statistical test will be performed between cont. variable & cat > 2 levels variable		103.79		120.24		80.14	11/11
	Choice of Test (1)	1/1		1/1		1/1
	Calculation of correct test statistic and p-value (2)	2/2		2/2		2/2
	Post Hoc for ANOVA (2) The mean difference, P-value	N/A^{^†}	N/A^{^†}	N/A^{^†}	N/A^{^†}	2/2	260.19
7	What statistical test will be performed between? cat variable and cat variable	--	41.50	--	54.98	--	119.92	7/9
	Choice of Test (1)	1/1		1/1		1/1
	Calculation of correct test statistic and p-value (2)	0/2		2/2		2/2
8	Perform Multiple Linear Regression with defined outcome (4), Adjusted R2 (variance), F statistic, and P-value. Coefficients	N/A^{^§}	N/A^{^§}	N/A^{^§}	N/A^{^§}	2/4	39.18	2/4
9	Make appropriate charts (bar graph) based on the dataset and specific variables (3) Cont. versus cont. variable plot Chart of normality for cont. variable, a Cluster bar chart of frequency for cat variable	3/3	24.51	0/3	65.01	3/3	39.16	6/9
	Total	21/25	487.74	20/25	747.02	32/35	1071.31	Score: 73/85
	Total	21/25	487.74	20/25	747.02	32/35	1071.31	Time: 2306.7 secs

ANOVA: Analysis of variance, Cont.: continuation, sec: seconds

↵* N/A as linear regression could not be performed for the small and medium datasets since the data was not linear and not normally distributed.
↵^† Post-hoc analysis was only done for ANOVA and datasets and there not applicable where the Kruskal Wallis test was performed instead.
↵^§ N/A as multiple linear regression could not be performed for the small and medium datasets.