I recently spoke with my college professor regarding the understanding of statistics, and I remarked that I learn the statistical concept the best (and with full comprehension) through simulation. He remarked that I might be in the minority of students (p<0.01 – see what I did here?). Not sure if this is true – if one day I manage to teach introductory course on statistics I will for sure try to use simulation and guided discovery.
Most of the statistical concepts seems to really dig in in my case when I simulate the data and try to find out how the statistical tests (and linear models) try to re-create the simulated relationships. Will Hopkins make a wonderful tutorial on using simulation HERE which I refer to now and then.
Anyway, I was going through the great FREE course Statistics One by prof. Andrew Conway at Coursera and I am trying to understand the hardest lectures on Moderation and Mediation. Because the concepts couldn’t dig in, I decided to play with simulation. No, not stimulation – SIMULATION (Note to myself: too much reading of Andy Field Discovering Statistics Using R and buying in to his type of humor)
In the following post I will try to simulate the data for moderation analysis. So this is my playbook using simulation and if anyone finds it useful great, if not, just disregard it. To find more about moderation please take a look at Wiki page) and mentioned Coursera course.
Simulating the data
To simulate the data I decided to use the relationship between squat strength and maximal velocity. Let’s suppose that the more one squats the higher speed one can reach. In this case predictor variable is squat strength and outcome variable is maximal speed.
Now we introduce the moderator variable: AGE. Let’s assume that the older the athlete the less transfer there is and vice versa (i.e. increase in strength will yield less increase in speed with older athletes).
It will be clearer when we simulate the data and graph it.