Professor Bates' Methodology 2 lectures on experiment design and on scale construction.
Experimental design and rationale
- Research Design: The purpose of design is to allow us to understand Causes and Mechanisms
- We progress towards the truth by pitting theories against each other, and by generating new theories.
- Unlike descriptive lists or personal beliefs or cultural constructions, scientific theories can be wrong.
- This is because they allow us to derive clear hypotheses that we can test.
- A big part of methodology is logical clarity
- Do the references really support the theory? (rather than some other theory or even topic, or a whole range of competing theories?)
- Does the experiment actually test the theory? Or are the results also compatible with competing theories
- Can the results be due to an artefact in the design - like experimenter demand effects?
- Is there Confounding? If so, the experiment doesn't test the theory.
- Reverse_causation? Might the causal arrows be reversed?
- White_hat_bias? Has the result received little scrutiny because it is viewed favourably?
- Case study: Does this study show what it concludes?.
- Causes of non-replication in social science?
- Low power: this will increase the ratio of false positives to true detections.
- Experimenter bias (researchers aware of what result they want).
- Publication bias (failures not mentioned in public).
- P-hacking (violating the assumption of significance testing by trying different tests without correcting the effective p-value threshold.
- Fraud (making up data or results).
- Case Studies:
- "Don't think and You'll be smarter"?
- Newell, B. R., & Shanks, D. R. (2014). Unconscious influences on decision making: a critical review. Behavioral and Brain Sciences, 37(1), 1-19. doi: 10.1017/S0140525X12003214
- Nieuwenstein, M. R., Wierenga, T., Morey, R. D., Wicherts, J. M., Blom, T. N., Wagenmakers, E.-J., & van Rijn, H. (2015). On making the right choice: A meta-analysis and large-scale replication attempt of the unconscious thought advantage. Judgment and Decision Making, 10(1), 1-17.
- Stereotype threat
- Unconscious control of behavior, e.g. "Priming slows walking"
- Hot baths warm hearts?
- "Don't think and You'll be smarter"?
Lecture 2 from Tim Bates: Questionnaire Design
Techniques often have strengths and limitations: We'll put lecture 1 into practice via thinking about questionnaire design
Questions/revision from students
- What's a good starter paper on enhancing replicability?
- What is a good guide to causality and modelling?
- This is a very readable book on that topic: Shipley, B. (2000). Cause and correlation in biology : a user's guide to path analysis, structural equations, and causal inference. Cambridge, UK ; New York, NY. USA: Cambridge University Press.
- What is "superficial validity"?
- That was a descriptive slide heading about validity in terms of "seeming right" or agreeing with some observer. I changed it to "surface-level" now.
- Is content validity to do with the precise content of the questions, whereas criterion validity is sort of making sure you're looking at the relevant outcome?
- Content validity refers to items seeming (to some observer) to have content that matches the construct being measured. Criterion validity refers to predicting an outcome that (some observers agree) measures the construct. So if higher IQ scores predicted better exam grades, that might count as criterion validity. Another researcher might claim that this is a new discovery: that IQ is related to school. In the end both are true. Which is why I called these surface-level validity in my lecture.
- Is it the case that face validity is separate to content and criterion validity, or is it subsumed by them? As is the case with predictive and concurrent validity?
- Lots of these terms are, in my opinion a bit like, the old descriptions med schools taught of schizophrenia "hebephrenic" "catatonic". They were descriptive at best, and a substitute for better understanding. That said. Face validity requires only that you or some expert says "that looks right to me". Content validity is clearly very closely linked to this idea. Criterion validity is clearly quite different.
- You have a slide title mentioning "external and internal manifestations of bias". What this was relating to?
- I changed this to "How might test-bias be visible in data?" which is hopefully more clear.
- Could you perhaps clarify the difference between culturally biased and culturally loaded
- In measurement theory, the idea of bias implies a problem in measurement: the measured result does't accurately reflect the underlying trait we want to measure. So a culturally biased test is one which doesn't measure the same thing in two cultures. A culturally loaded test is one which measures something that is affected by culture. So a test could be culturally loaded but not biased, or not cultural lay loaded, but be biased, or be both culturally loaded and biased.
#You give examples of culturally loaded questions (Edinburgh parks and E=mc^2), stating that the Physics question is lower on bias (presumably because E=mc^2 is more universally known, thus it can be assumed that more people across the globe will recognise this equation as opposed to a question detailing knowledge of Edinburgh parks). Thus, is the Edinburgh park question higher on bias?
* That was what I intended. I think I mentioned that Binet in his testing sought to build items which were either not dependent on experience, or were dependent only on experiences which all subjects could be expected to have had. In that way, he sought to minimise bias. If you define your population within which the test is to be used (say, people born and living in Edinburgh), then a measure of park knowledge would not be biased.
- Is bias dependent, then, on the samples of participants you ask?
- Yes, as stated above, a test bias will only be manifested when groups are tested which differ in some way the test-constructor did not anticipate. For instance, if you discovered that naming Edinburgh parks was a biased item for Taxi drivers in Aberdeen, you could modify the test to use "local parks".