Within Psychometrics, there are lots of sources for construct validation. Here, we will try to follow the reccomendations of Flake, J. K., Pek, J., & Hehman, E. (2017), in Social Psychological and Personality Science. Below, each phase of construct validation is described as in “Construct Validation in Social and Personality Research: Current Practice and Recommendations.”
Item number | Scale | Sub-scale | Index | Item text |
---|---|---|---|---|
1 | TDDS | Moral | M1 | Shoplifting a candy bar from a convenience store |
4 | TDDS | Moral | M2 | Stealing from a neighbor |
7 | TDDS | Moral | M3 | A student cheating to get good grades |
10 | TDDS | Moral | M4 | Deceiving a friend |
13 | TDDS | Moral | M5 | Forging someone’s signature on a legal document |
16 | TDDS | Moral | M6 | Cutting to the front of a line to purchase the last few tickets to a show |
19 | TDDS | Moral | M7 | Intentionally lying during a business transaction |
3 | TDDS | Pathogen | P1 | Stepping on dog poop |
6 | TDDS | Pathogen | P2 | Sitting next to someone who has red sores on their arm |
9 | TDDS | Pathogen | P3 | Shaking hands with a stranger who has sweaty palms |
12 | TDDS | Pathogen | P4 | Seeing some mold on old leftovers in your refrigerator |
15 | TDDS | Pathogen | P5 | Standing close to a person who has body odor |
18 | TDDS | Pathogen | P6 | Seeing a cockroach run across the floor |
21 | TDDS | Pathogen | P7 | Accidentally touching a person’s bloody cut |
2 | TDDS | Sexual | S1 | Hearing two strangers having sex |
5 | TDDS | Sexual | S2 | Performing oral sex |
8 | TDDS | Sexual | S3 | Watching a pornographic video |
11 | TDDS | Sexual | S4 | Finding out that someone you do not like has sexual fantasies about you |
14 | TDDS | Sexual | S5 | Bringing someone you just met back to your room to have sex |
17 | TDDS | Sexual | S6 | A stranger of the opposite sex intentionally rubbing your thigh in an elevator |
20 | TDDS | Sexual | S7 | Having anal sex with someone of the opposite sex |
Number of missing responses | |
---|---|
TDDS_1 | 0 |
TDDS_2 | 1 |
TDDS_3 | 0 |
TDDS_4 | 4 |
TDDS_5 | 3 |
TDDS_6 | 3 |
TDDS_7 | 1 |
TDDS_8 | 1 |
TDDS_9 | 2 |
TDDS_10 | 1 |
TDDS_11 | 2 |
TDDS_12 | 4 |
TDDS_13 | 2 |
TDDS_14 | 1 |
TDDS_15 | 4 |
TDDS_16 | 3 |
TDDS_17 | 0 |
TDDS_18 | 1 |
TDDS_19 | 2 |
TDDS_20 | 0 |
TDDS_21 | 1 |
Online data collection is inherently prone to careless responding. This can increase measurement error, attenuate effects and lead to spurious relations. To avoid these negative consequences, one needs to check for patteners, speeders and one-liners in the data as to assure data quality. We perform these checks now.
Normally, the presence of patterners are checked via the written responses or .pdf (when online) of each respondent. As N grows, this task becomes daunting. As a mean to mitigate the influence of careless responding, methods trying to identify patterns have been developed to assure analyses are performed on a greater number of (likely) truthful answers.
The assumption here is that participants often respond in a sort of a pattern. For example, if participants choose the middle (or any other) response category from top to bottom of a given scale. A common pattern - assuming several items whose response categories go from 1 to 7 - is 1,2,3,4,5,6,7,6,5,4,3,2,1, forming diagonals. Also common is around the scale mid-point: 4,5,6,5,4,5,6. Figure X below speaks to the first case: number of participants using only one unique response category: either 1,1,1,1,1,1,1, or 2,2,2,2,2,2, etc. These are called one-liners, and for this pattern, the TDDS data shows 16 cases/participants. We also find that there are 16 participants who only chose two unique response categories in their answers. Since the number of observations containing the use of 1 or 2 unique responses (N=32 or 3%) does not threatens the validity of the analyses, these are left in the analysis.
Unique Responses | Frequencies |
---|---|
1 | 16 |
2 | 16 |
3 | 54 |
4 | 116 |
5 | 252 |
6 | 355 |
7 | 213 |
Another possible (and common) pattern are those forming diagonals around the mid-point. So, for example, assuming items whose response categories go from 1 to 7, a diagonal response pattern could be: 1,2,3,4,5,6,7,6,5,4,3,2,1 or only around the scale mid-point would be 4,5,3,4,5,3. We check this here, and I report those cases/participants so you can check whether these are true positives careless respondents. Lastly, whenever collecting data in the future, one of the best ways to find careless responding is to implement data quality checks and time controls.
First 16 cases are the participants choosing only 1 unique response, and the other 16 are those who chose only 2.
## [1] "58" "59" "70" "154" "161" "404" "430" "632" "668" "694"
## [11] "714" "723" "738" "748" "1002" "1057" "66" "103" "120" "123"
## [21] "429" "530" "628" "692" "696" "702" "815" "951" "995" "1007"
## [31] "1027" "1047"
The first 13 are the cases which participants used diagonal patterns with a difference of 1, and the one case left, with a difference of 2. All cases should be inspected as to confirm carelessness.
## [1] "568" "602" "771" "145" "637" "1006" "223" "549" "643" "660"
## [11] "756" "763" "765" "654"
There are 36 participants with missing values. For the sake of simplicity they were deleted. The ensuing data-set has 1022 cases. The labels [M], [S] and [P] were added item’s labels to better identify to which subscales each item belong. They stand for Moral, Sexual and Pathogen subscales. To give an apercu of the data, statistical summaries for the 21 items are provided - taking responses as numerical.
vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
[M] TDDS_1 | 1 | 1022 | 3.48 | 1.81 | 3 | 3.39 | 1.48 | 1 | 7 | 6 | 0.28 | -0.90 | 0.06 |
[S] TDDS_2 | 2 | 1022 | 3.80 | 1.90 | 4 | 3.74 | 2.97 | 1 | 7 | 6 | 0.10 | -1.12 | 0.06 |
[P] TDDS_3 | 3 | 1022 | 5.47 | 1.45 | 6 | 5.65 | 1.48 | 1 | 7 | 6 | -0.83 | 0.03 | 0.05 |
[M] TDDS_4 | 4 | 1022 | 5.01 | 1.86 | 6 | 5.24 | 1.48 | 1 | 7 | 6 | -0.82 | -0.39 | 0.06 |
[S] TDDS_5 | 5 | 1022 | 2.47 | 1.82 | 2 | 2.16 | 1.48 | 1 | 7 | 6 | 1.08 | 0.04 | 0.06 |
[P] TDDS_6 | 6 | 1022 | 4.15 | 1.68 | 4 | 4.16 | 1.48 | 1 | 7 | 6 | -0.13 | -0.86 | 0.05 |
[M] TDDS_7 | 7 | 1022 | 4.05 | 1.87 | 4 | 4.06 | 1.48 | 1 | 7 | 6 | -0.13 | -1.04 | 0.06 |
[S] TDDS_8 | 8 | 1022 | 2.75 | 1.97 | 2 | 2.46 | 1.48 | 1 | 7 | 6 | 0.92 | -0.39 | 0.06 |
[P] TDDS_9 | 9 | 1022 | 4.08 | 1.55 | 4 | 4.08 | 1.48 | 1 | 7 | 6 | -0.07 | -0.72 | 0.05 |
[M] TDDS_10 | 10 | 1022 | 4.76 | 1.77 | 5 | 4.92 | 1.48 | 1 | 7 | 6 | -0.61 | -0.53 | 0.06 |
[S] TDDS_11 | 11 | 1022 | 3.88 | 1.94 | 4 | 3.85 | 2.97 | 1 | 7 | 6 | 0.05 | -1.18 | 0.06 |
[P] TDDS_12 | 12 | 1022 | 4.67 | 1.70 | 5 | 4.76 | 1.48 | 1 | 7 | 6 | -0.37 | -0.76 | 0.05 |
[M] TDDS_13 | 13 | 1022 | 4.52 | 1.93 | 5 | 4.64 | 1.48 | 1 | 7 | 6 | -0.43 | -0.92 | 0.06 |
[S] TDDS_14 | 14 | 1022 | 3.45 | 2.07 | 3 | 3.32 | 2.97 | 1 | 7 | 6 | 0.35 | -1.20 | 0.06 |
[P] TDDS_15 | 15 | 1022 | 4.81 | 1.52 | 5 | 4.90 | 1.48 | 1 | 7 | 6 | -0.46 | -0.50 | 0.05 |
[M] TDDS_16 | 16 | 1022 | 4.25 | 1.84 | 4 | 4.31 | 1.48 | 1 | 7 | 6 | -0.26 | -0.93 | 0.06 |
[S] TDDS_17 | 17 | 1022 | 4.62 | 2.10 | 5 | 4.78 | 2.97 | 1 | 7 | 6 | -0.40 | -1.19 | 0.07 |
[P] TDDS_18 | 18 | 1022 | 4.76 | 1.73 | 5 | 4.88 | 1.48 | 1 | 7 | 6 | -0.44 | -0.75 | 0.05 |
[M] TDDS_19 | 19 | 1022 | 4.37 | 1.82 | 5 | 4.47 | 1.48 | 1 | 7 | 6 | -0.38 | -0.83 | 0.06 |
[S] TDDS_20 | 20 | 1022 | 3.55 | 2.21 | 3 | 3.44 | 2.97 | 1 | 7 | 6 | 0.30 | -1.34 | 0.07 |
[P] TDDS_21 | 21 | 1022 | 4.98 | 1.64 | 5 | 5.14 | 1.48 | 1 | 7 | 6 | -0.62 | -0.35 | 0.05 |
Gender* | 22 | 1022 | 1.46 | 0.50 | 1 | 1.45 | 0.00 | 1 | 2 | 1 | 0.16 | -1.97 | 0.02 |
vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
[M] TDDS_1 | 1 | 1022 | 3.48 | 1.81 | 3 | 3.39 | 1.48 | 1 | 7 | 6 | 0.28 | -0.90 | 0.06 |
[M] TDDS_4 | 2 | 1022 | 5.01 | 1.86 | 6 | 5.24 | 1.48 | 1 | 7 | 6 | -0.82 | -0.39 | 0.06 |
[M] TDDS_7 | 3 | 1022 | 4.05 | 1.87 | 4 | 4.06 | 1.48 | 1 | 7 | 6 | -0.13 | -1.04 | 0.06 |
[M] TDDS_10 | 4 | 1022 | 4.76 | 1.77 | 5 | 4.92 | 1.48 | 1 | 7 | 6 | -0.61 | -0.53 | 0.06 |
[M] TDDS_13 | 5 | 1022 | 4.52 | 1.93 | 5 | 4.64 | 1.48 | 1 | 7 | 6 | -0.43 | -0.92 | 0.06 |
[M] TDDS_16 | 6 | 1022 | 4.25 | 1.84 | 4 | 4.31 | 1.48 | 1 | 7 | 6 | -0.26 | -0.93 | 0.06 |
[M] TDDS_19 | 7 | 1022 | 4.37 | 1.82 | 5 | 4.47 | 1.48 | 1 | 7 | 6 | -0.38 | -0.83 | 0.06 |
vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
[S] TDDS_2 | 1 | 1022 | 3.80 | 1.90 | 4 | 3.74 | 2.97 | 1 | 7 | 6 | 0.10 | -1.12 | 0.06 |
[S] TDDS_5 | 2 | 1022 | 2.47 | 1.82 | 2 | 2.16 | 1.48 | 1 | 7 | 6 | 1.08 | 0.04 | 0.06 |
[S] TDDS_8 | 3 | 1022 | 2.75 | 1.97 | 2 | 2.46 | 1.48 | 1 | 7 | 6 | 0.92 | -0.39 | 0.06 |
[S] TDDS_11 | 4 | 1022 | 3.88 | 1.94 | 4 | 3.85 | 2.97 | 1 | 7 | 6 | 0.05 | -1.18 | 0.06 |
[S] TDDS_14 | 5 | 1022 | 3.45 | 2.07 | 3 | 3.32 | 2.97 | 1 | 7 | 6 | 0.35 | -1.20 | 0.06 |
[S] TDDS_17 | 6 | 1022 | 4.62 | 2.10 | 5 | 4.78 | 2.97 | 1 | 7 | 6 | -0.40 | -1.19 | 0.07 |
[S] TDDS_20 | 7 | 1022 | 3.55 | 2.21 | 3 | 3.44 | 2.97 | 1 | 7 | 6 | 0.30 | -1.34 | 0.07 |
vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
[P] TDDS_3 | 1 | 1022 | 5.47 | 1.45 | 6 | 5.65 | 1.48 | 1 | 7 | 6 | -0.83 | 0.03 | 0.05 |
[P] TDDS_6 | 2 | 1022 | 4.15 | 1.68 | 4 | 4.16 | 1.48 | 1 | 7 | 6 | -0.13 | -0.86 | 0.05 |
[P] TDDS_9 | 3 | 1022 | 4.08 | 1.55 | 4 | 4.08 | 1.48 | 1 | 7 | 6 | -0.07 | -0.72 | 0.05 |
[P] TDDS_12 | 4 | 1022 | 4.67 | 1.70 | 5 | 4.76 | 1.48 | 1 | 7 | 6 | -0.37 | -0.76 | 0.05 |
[P] TDDS_15 | 5 | 1022 | 4.81 | 1.52 | 5 | 4.90 | 1.48 | 1 | 7 | 6 | -0.46 | -0.50 | 0.05 |
[P] TDDS_18 | 6 | 1022 | 4.76 | 1.73 | 5 | 4.88 | 1.48 | 1 | 7 | 6 | -0.44 | -0.75 | 0.05 |
[P] TDDS_21 | 7 | 1022 | 4.98 | 1.64 | 5 | 5.14 | 1.48 | 1 | 7 | 6 | -0.62 | -0.35 | 0.05 |
Within the framework of Classical Test Theory (CTT), we see that some items have skewed distributions - i.e., 3 & 4 (skewed to the right) and 5 & 8 (left). While the former indicates ceiling effects, the latter suggests floor effects. This can be seen also looking at the median for each item. It has been argued that ceiling effects could be due to acquiescent bias and/or social desirability [@viswanathan2005measurement]. For a graphical representation of the above presented summaries, Figure 1 shows the distribution of response categories for each item. Note that these so-called flagged items are ranked at the bottom and top of Figure 1, again depicting ceiling and floor effects. In Item Response Theory, however, having items with skewed distributions is not only allowed, but may be recommended depending on the objective of the instrument.
Tables 5 contains the same information as in the graph - i.e., the proportions for all response categories for each item. Note that some items show very little endorsement on the more extreme categories, e.g., 3, 9 ,15, 18. Under CTT, this might indicate that reducing the response categories (from 7 to 5) could be more appropriate for these scales. [PS note: we will see that this reduction of response categories is indeed recommended for all DS instruments]
While skewness is sometimes considered a problem that needs to be addressed in classical approaches, Item Response Theory may welcome it. This is because such items are more likely to have the ability to discriminate in non-central regions of the latent construct. Scale development without IRT usually produce a set of items whose informative value is often at the average/center of the latent construct.
While the concept of information is discussed in depth below, here’s a brief summary. Information is related to the accuracy with which we can estimate ability. It provides an indication of capacity of a given test to yield information across the latent construct \(\theta\). In other words, it provides an indication of the instrument’s ability to differentiate among respondents. In fact, Information is one of the major contributions of item response theory to psychometrics, and it is a de facto extension of the concept of reliability.
1 | 2 | 3 | 4 | 5 | 6 | 7 | |
---|---|---|---|---|---|---|---|
TDDS_1 | 0.18 | 0.16 | 0.18 | 0.18 | 0.14 | 0.08 | 0.07 |
TDDS_2 | 0.14 | 0.16 | 0.15 | 0.17 | 0.16 | 0.12 | 0.10 |
TDDS_3 | 0.01 | 0.03 | 0.07 | 0.12 | 0.21 | 0.25 | 0.31 |
TDDS_4 | 0.08 | 0.05 | 0.07 | 0.12 | 0.17 | 0.26 | 0.25 |
TDDS_5 | 0.46 | 0.17 | 0.10 | 0.11 | 0.06 | 0.05 | 0.05 |
TDDS_6 | 0.07 | 0.13 | 0.16 | 0.19 | 0.23 | 0.14 | 0.09 |
TDDS_7 | 0.13 | 0.10 | 0.14 | 0.18 | 0.18 | 0.15 | 0.11 |
TDDS_8 | 0.39 | 0.19 | 0.12 | 0.10 | 0.06 | 0.05 | 0.09 |
TDDS_9 | 0.05 | 0.14 | 0.18 | 0.21 | 0.25 | 0.12 | 0.06 |
TDDS_10 | 0.08 | 0.05 | 0.11 | 0.15 | 0.21 | 0.23 | 0.17 |
TDDS_11 | 0.15 | 0.14 | 0.15 | 0.15 | 0.15 | 0.14 | 0.12 |
TDDS_12 | 0.04 | 0.09 | 0.12 | 0.16 | 0.23 | 0.18 | 0.17 |
TDDS_13 | 0.11 | 0.08 | 0.08 | 0.16 | 0.20 | 0.17 | 0.19 |
TDDS_14 | 0.25 | 0.17 | 0.13 | 0.13 | 0.11 | 0.10 | 0.12 |
TDDS_15 | 0.02 | 0.07 | 0.12 | 0.17 | 0.25 | 0.24 | 0.13 |
TDDS_16 | 0.11 | 0.10 | 0.12 | 0.19 | 0.21 | 0.16 | 0.12 |
TDDS_17 | 0.12 | 0.09 | 0.10 | 0.13 | 0.12 | 0.16 | 0.28 |
TDDS_18 | 0.04 | 0.09 | 0.11 | 0.15 | 0.24 | 0.18 | 0.20 |
TDDS_19 | 0.10 | 0.09 | 0.10 | 0.18 | 0.23 | 0.17 | 0.13 |
TDDS_20 | 0.28 | 0.13 | 0.12 | 0.12 | 0.10 | 0.08 | 0.17 |
TDDS_21 | 0.04 | 0.06 | 0.08 | 0.17 | 0.24 | 0.21 | 0.21 |
In Table 6, additional statistical summaries are displayed. In specific, those that relate to total scores (adding or taking taking the mean for all responses for every subject). Accordingly, and to provide a complete analysis, we analyse total scores with both means and sums, as both approaches yield pros & cons.
In Table 7 and 8, these results are further broken down by gender. Trimmed mean is calculated ignoring ‘extreme’ (0.1) observations. It is useful for checking for outliers. MAD is a robust measure of variability.
vars | n | mean | sd | min | max | range | se | |
---|---|---|---|---|---|---|---|---|
Pathogen.Sum | 1 | 1022 | 32.91 | 8.00 | 7 | 49 | 42 | 0.25 |
Pathogen.Mean | 2 | 1022 | 4.70 | 1.14 | 1 | 7 | 6 | 0.04 |
Moral.Sum | 3 | 1022 | 30.44 | 10.86 | 7 | 49 | 42 | 0.34 |
Moral.Mean | 4 | 1022 | 4.35 | 1.55 | 1 | 7 | 6 | 0.05 |
Sexual.Sum | 5 | 1022 | 24.53 | 10.52 | 7 | 49 | 42 | 0.33 |
Sexual.Mean | 6 | 1022 | 3.50 | 1.50 | 1 | 7 | 6 | 0.05 |
vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Pathogen.Mean | 1 | 553 | 4.52 | 1.09 | 4.57 | 4.55 | 1.06 | 1 | 7 | 6 | -0.31 | 0.10 | 0.05 |
Moral.Mean | 2 | 553 | 4.20 | 1.50 | 4.43 | 4.29 | 1.27 | 1 | 7 | 6 | -0.49 | -0.29 | 0.06 |
Sexual.Mean | 3 | 553 | 2.90 | 1.30 | 2.71 | 2.80 | 1.27 | 1 | 7 | 6 | 0.66 | -0.02 | 0.06 |
vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Pathogen.Mean | 1 | 469 | 4.92 | 1.16 | 5.00 | 4.98 | 1.27 | 1 | 7 | 6 | -0.48 | -0.04 | 0.05 |
Moral.Mean | 2 | 469 | 4.52 | 1.60 | 4.86 | 4.65 | 1.48 | 1 | 7 | 6 | -0.64 | -0.27 | 0.07 |
Sexual.Mean | 3 | 469 | 4.21 | 1.41 | 4.14 | 4.20 | 1.48 | 1 | 7 | 6 | 0.08 | -0.73 | 0.07 |
Below, Figure 4 to 6, displays the distribution of response categories per subscale across gender.
It is necessary to examine properties pertaining to the internal characteristics of a scale. Internal consistency is the extent to which items within a scale (or subscale) measure the same construct, as evidenced by how well items/sub-scales vary together or inter-correlate. One of the most important characteristics of a scale relates to its inter- and intra-subscale correlations.
Figure 8 shows TDDS’s inter-subscale correlations. Normally, some correlation - albeit small - is expected between factors.
As for intra-subscale, ideally, the correlation matrix below (Figure 9) should show three distinct squares across the diagonal (in green), which would indicate a high correlation between items belonging to the same subscale.
From the results, we see that the intra-subscale correlation for the Pathogen subscale is comparatively weaker than for Sexual and Moral subscales (the latter has the highest intra-subscale correlations). From a classical point of view, internal validity concerns may be be raised for the Pathogen subscale, as its items should be “more” correlated. However, to my knowledge, there are not ‘thresholds’ ascertaining the best intra sub-scale correlations.
The internal consistency of a scale can also be examined with item-to-scale correlations and inter-correlations of items within a scale @devellis2003scale. If a group of items measures a single latent construct, we would assume that each item alone correlates with the scale overall and that items within a scale are positively correlated. According to @clark1995constructing, the average inter-item correlation should fall somewhere between .15 and .50.
@peters2014alpha wrote:
“As Cronbach’s alpha is the most relied upon indicator of a scale’s reliability and internal consistency. Cronbach’s alpha is often viewed as some kind of quality label: high values certify scale quality, low values prompt removal of one or several items. Unfortunately, this approach suffers two fundamental problems. First, Cronbach’s alpha is both unrelated to a scale’s internal consistency and a fatally flawed estimate of its reliability. Second, the approach itself assumes that scale items are repeated measurements, an assumption that is often violated and rarely desirable. The problems with Cronbach’s alpha are easily solved by computing readily available alternatives, such as the ‘greatest lower bound’ (glb; Sijtsma, 2009) and omega (Revelle & Zinbarg, 2009). Sijtsma (2009) argued that the glb is the lowest possible value that a scale’s reliability can have. That means that when the glb is known, the reliability is by definition in the interval [glb, 1]. Revelle and Zinbarg (2009) argue that omega in fact provides a more accurate approximation of a scale’s reliability, and that omega is almost always higher.”
As the intricacies of the concept of reliability unfolds below, we show why the glb and omega only seem to be just two reliability indices, whereas in reality they are underlied by very different theoretical and psychometric assumptions/thinking.
Classical test theory CTT assumes that each person has a true score that would be obtained if there were no errors in measurement. A person’s true score is defined as the expected number of correct scores over an infinite number of independent administrations of the test. A person’s latent trait is never observed, but estimated by an observed score with some measurement error. CTT describes how errors of measurement can influence observed scores under certain assumptions:
The observed test-score is the sum of true score and measurement error: \(X = \tau + \varepsilon\)
The expected value of observed scores is the true score: \(E[X] = \tau\)
The measurement error of a test and the true scores are uncorrelated: \(\rho_{\varepsilon \tau} = 0\)
Error scores on two different tests are uncorrelated: \(\rho_{\varepsilon_{1} \varepsilon_{2}} = 0\)
The measurement error of a test and the true scores on all other tests are uncorrelated: \(\rho_{\varepsilon_{1} \tau_{2}} = 0\)
If two tests have observed scores X and X’ that satisfy assumptions 1-5, and if, for every population of examinees, \(\tau = \tau'\) and \(\sigma^{2}_{\varepsilon} = \sigma^{2}_{\varepsilon'}\), then the tests are called \(\textbf{parallel tests}\).
If two tests have observed scores \(\mathbf{X_{1}}\) and \(\mathbf{X_{2}}\) that satisfy assumptions \(1-5\), and if, for every population of examinees, \(\tau_{1} = \tau_{2}+c\), where c is a constant, then the tests are called essentially \(\tau\) - equivalent tests.
Reliability: Types
There are several general classes of reliability estimates:
Inter-rater reliability assesses the degree of agreement between two or more raters in their appraisals.
Test-retest reliability assesses the degree to which test scores are consistent from one test administration to the next. Also included is intra-rater reliability, in which measurements are gathered from a single rater who uses the same methods or instruments and the same testing conditions.
Inter-method reliability assesses the degree to which test scores are consistent when there is a variation in the methods or instruments used. This allows inter-rater reliability to be ruled out. When dealing with forms, it may be termed parallel-forms reliability.
Internal consistency reliability, assesses the consistency of results across items within a test.
Reliability vs. Validity
Reliability does not imply validity. That is, a reliable measure that is measuring something consistently is not necessarily measuring what you want to be measuring.
Cronbach’s \(\boldsymbol{\alpha}\)
Internal consistency is an estimation of test-retest reliability based on the correlation among the variables (items). Cronbach’s \(\boldsymbol{\alpha}\) (Cronbach 1951) is a coefficient of internal consistency. It is commonly used as an estimate of the reliability of a psychometric test for a sample of examinees, which provides a convenient index of overall test quality in a single number.
\[ \alpha = \left[\frac{N}{N-1}\right] \cdot \left[\frac{\sigma_{X}^{2} - \sum\limits^{N}_{i=1} \sigma^{2}_{Y_{i}}}{\sigma^{2}_{X}}\right] \]
Thresholds for \(\boldsymbol{\alpha}\):
\(\boldsymbol{\alpha}\) | Valuation | Purpose |
---|---|---|
\(\alpha\) \(\geq\) 0.9 | Excellent | High-Stakes testing |
0.8 \(\leq\) \(\alpha\) < 0.9 | Good | Low-Stakes testing |
0.7 \(\leq\) \(\alpha\) < 0.8 | Poor | |
\(\alpha\) < 0.7 | Unacceptable |
Cronbach’s \(\alpha\) is thought to yield reliability. It is commonly portrayed as an assessment of internal validity of a test. Cronbach’s \(\alpha\) is also assumed (implicitly) to be an estimate of the average correlation of a set of items pertaining to the same construct. But, in fact, \(\alpha\) yields the lower-bound a type of reliability termed “internal consistency reliability”, which is different than reliability per se (overall consistency of a measure - e.g. pretest-post-test). Importantly, reliability as per Cronbach’s \(\alpha\) is a property of the scores of a measure, thus it is sample dependent. Cronbach’s \(\alpha\) is only a measure of internal consistency when two stringent criteria are met: (a) unidimensionality and (b) equal factor loadings (or slopes).
Spearman-Brown Prophecy
The Spearman-Brown (Spearman 1910; Brown 1910) formula is used to predict the reliability of a test after changing the test length.
\[ \rho_{XX'} = \frac{N \cdot \rho_{YY'}}{1+(N-1) \cdot \rho_{YY'}} \]
where
\(\qquad N\) is the factor by which the length of the test is changed,
\(\qquad \rho_{XX'}\) is the predicted reliability coefficient, and
\(\qquad \rho_{YY'}\) is reliability of the original test
For this reason, reliance on only Cronbach’s alpha results is unwarranted. For example, if you calculate Cronbach’s \(\alpha\) for the full 21 items, it yields an \(\alpha\) = 0.907. This is because when calculating \(\alpha\) the number of items influences the \(\alpha\) index. As the number of items increase, so does \(\alpha\), even if there are weak levels of internal consistency. Another representation of this formula is \(\alpha_{new} = \frac{k\bar{r}}{1 + (k-1)\bar{r}}\) \(\bar{r}\); where \(\bar{r}\) is the average inter-item correlation of standardized scores, and k is the number of items.
TDSS is composed of 3 (somewhat independent) constructs, we calculate \(\alpha\) separately for each subscale. We also estimates the resulting reliability if an item is dropped and the mean inter-item correlations within factors.
##
## Reliability analysis
## Call: psych::alpha(x = dfS.[, 1:7])
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.87 0.87 0.86 0.49 6.8 0.0062 3.5 1.5 0.5
##
## lower alpha upper 95% confidence boundaries
## 0.86 0.87 0.88
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r
## [S] TDDS_2 0.85 0.85 0.84 0.49 5.8 0.0072 0.0080
## [S] TDDS_5 0.85 0.86 0.84 0.50 5.9 0.0071 0.0040
## [S] TDDS_8 0.84 0.84 0.82 0.47 5.3 0.0077 0.0038
## [S] TDDS_11 0.85 0.86 0.84 0.50 5.9 0.0071 0.0079
## [S] TDDS_14 0.84 0.85 0.83 0.48 5.5 0.0076 0.0075
## [S] TDDS_17 0.86 0.86 0.85 0.52 6.4 0.0066 0.0043
## [S] TDDS_20 0.85 0.85 0.84 0.49 5.8 0.0072 0.0064
## med.r
## [S] TDDS_2 0.50
## [S] TDDS_5 0.50
## [S] TDDS_8 0.48
## [S] TDDS_11 0.50
## [S] TDDS_14 0.48
## [S] TDDS_17 0.51
## [S] TDDS_20 0.50
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## [S] TDDS_2 1022 0.74 0.75 0.69 0.64 3.8 1.9
## [S] TDDS_5 1022 0.73 0.74 0.69 0.63 2.5 1.8
## [S] TDDS_8 1022 0.81 0.82 0.80 0.73 2.8 2.0
## [S] TDDS_11 1022 0.73 0.74 0.67 0.63 3.9 1.9
## [S] TDDS_14 1022 0.79 0.78 0.74 0.70 3.5 2.1
## [S] TDDS_17 1022 0.69 0.68 0.60 0.56 4.6 2.1
## [S] TDDS_20 1022 0.76 0.75 0.70 0.65 3.6 2.2
##
## Non missing response frequency for each item
## 1 2 3 4 5 6 7 miss
## [S] TDDS_2 0.14 0.16 0.15 0.17 0.16 0.12 0.10 0
## [S] TDDS_5 0.46 0.17 0.10 0.11 0.06 0.05 0.05 0
## [S] TDDS_8 0.39 0.19 0.12 0.10 0.06 0.05 0.09 0
## [S] TDDS_11 0.15 0.14 0.15 0.15 0.15 0.14 0.12 0
## [S] TDDS_14 0.25 0.17 0.13 0.13 0.11 0.10 0.12 0
## [S] TDDS_17 0.12 0.09 0.10 0.13 0.12 0.16 0.28 0
## [S] TDDS_20 0.28 0.13 0.12 0.12 0.10 0.08 0.17 0
##
## Reliability analysis
## Call: psych::alpha(x = dfP.[, 1:7])
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.83 0.84 0.82 0.42 5.1 0.0079 4.7 1.1 0.43
##
## lower alpha upper 95% confidence boundaries
## 0.82 0.83 0.85
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r
## [P] TDDS_3 0.82 0.82 0.81 0.44 4.6 0.0087 0.0062
## [P] TDDS_6 0.82 0.82 0.80 0.44 4.7 0.0086 0.0042
## [P] TDDS_9 0.82 0.82 0.80 0.43 4.5 0.0089 0.0061
## [P] TDDS_12 0.81 0.82 0.80 0.42 4.4 0.0090 0.0062
## [P] TDDS_15 0.79 0.79 0.77 0.39 3.9 0.0100 0.0047
## [P] TDDS_18 0.81 0.82 0.80 0.42 4.4 0.0090 0.0062
## [P] TDDS_21 0.81 0.81 0.79 0.41 4.2 0.0095 0.0072
## med.r
## [P] TDDS_3 0.44
## [P] TDDS_6 0.43
## [P] TDDS_9 0.44
## [P] TDDS_12 0.44
## [P] TDDS_15 0.41
## [P] TDDS_18 0.43
## [P] TDDS_21 0.43
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## [P] TDDS_3 1022 0.65 0.67 0.58 0.53 5.5 1.5
## [P] TDDS_6 1022 0.67 0.66 0.58 0.52 4.1 1.7
## [P] TDDS_9 1022 0.69 0.69 0.62 0.56 4.1 1.5
## [P] TDDS_12 1022 0.71 0.70 0.64 0.58 4.7 1.7
## [P] TDDS_15 1022 0.80 0.80 0.77 0.71 4.8 1.5
## [P] TDDS_18 1022 0.71 0.70 0.63 0.58 4.8 1.7
## [P] TDDS_21 1022 0.74 0.74 0.69 0.63 5.0 1.6
##
## Non missing response frequency for each item
## 1 2 3 4 5 6 7 miss
## [P] TDDS_3 0.01 0.03 0.07 0.12 0.21 0.25 0.31 0
## [P] TDDS_6 0.07 0.13 0.16 0.19 0.23 0.14 0.09 0
## [P] TDDS_9 0.05 0.14 0.18 0.21 0.25 0.12 0.06 0
## [P] TDDS_12 0.04 0.09 0.12 0.16 0.23 0.18 0.17 0
## [P] TDDS_15 0.02 0.07 0.12 0.17 0.25 0.24 0.13 0
## [P] TDDS_18 0.04 0.09 0.11 0.15 0.24 0.18 0.20 0
## [P] TDDS_21 0.04 0.06 0.08 0.17 0.24 0.21 0.21 0
##
## Reliability analysis
## Call: psych::alpha(x = dfM.[, 1:7])
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.93 0.93 0.93 0.66 14 0.0033 4.3 1.6 0.66
##
## lower alpha upper 95% confidence boundaries
## 0.92 0.93 0.94
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r
## [M] TDDS_1 0.93 0.93 0.92 0.69 14 0.0033 0.0015
## [M] TDDS_4 0.92 0.92 0.91 0.65 11 0.0039 0.0053
## [M] TDDS_7 0.92 0.92 0.91 0.66 11 0.0039 0.0059
## [M] TDDS_10 0.92 0.92 0.91 0.66 11 0.0039 0.0043
## [M] TDDS_13 0.92 0.92 0.91 0.65 11 0.0041 0.0040
## [M] TDDS_16 0.92 0.92 0.91 0.66 12 0.0038 0.0049
## [M] TDDS_19 0.92 0.92 0.91 0.64 11 0.0041 0.0037
## med.r
## [M] TDDS_1 0.70
## [M] TDDS_4 0.66
## [M] TDDS_7 0.66
## [M] TDDS_10 0.66
## [M] TDDS_13 0.65
## [M] TDDS_16 0.69
## [M] TDDS_19 0.65
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## [M] TDDS_1 1022 0.75 0.76 0.69 0.67 3.5 1.8
## [M] TDDS_4 1022 0.85 0.85 0.83 0.80 5.0 1.9
## [M] TDDS_7 1022 0.85 0.85 0.82 0.79 4.0 1.9
## [M] TDDS_10 1022 0.85 0.85 0.82 0.79 4.8 1.8
## [M] TDDS_13 1022 0.88 0.87 0.86 0.82 4.5 1.9
## [M] TDDS_16 1022 0.83 0.83 0.79 0.77 4.2 1.8
## [M] TDDS_19 1022 0.88 0.88 0.86 0.83 4.4 1.8
##
## Non missing response frequency for each item
## 1 2 3 4 5 6 7 miss
## [M] TDDS_1 0.18 0.16 0.18 0.18 0.14 0.08 0.07 0
## [M] TDDS_4 0.08 0.05 0.07 0.12 0.17 0.26 0.25 0
## [M] TDDS_7 0.13 0.10 0.14 0.18 0.18 0.15 0.11 0
## [M] TDDS_10 0.08 0.05 0.11 0.15 0.21 0.23 0.17 0
## [M] TDDS_13 0.11 0.08 0.08 0.16 0.20 0.17 0.19 0
## [M] TDDS_16 0.11 0.10 0.12 0.19 0.21 0.16 0.12 0
## [M] TDDS_19 0.10 0.09 0.10 0.18 0.23 0.17 0.13 0
##
## Reliability analysis
## Call: psych::alpha(x = dataS[, 1:21])
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.91 0.91 0.93 0.32 9.7 0.0043 4.2 1.1 0.27
##
## lower alpha upper 95% confidence boundaries
## 0.9 0.91 0.91
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## [M] TDDS_1 0.9 0.90 0.93 0.31 9.2 0.0045 0.026 0.27
## [S] TDDS_2 0.9 0.90 0.93 0.31 9.2 0.0045 0.028 0.26
## [P] TDDS_3 0.9 0.90 0.93 0.32 9.5 0.0044 0.027 0.27
## [M] TDDS_4 0.9 0.90 0.93 0.31 9.1 0.0045 0.024 0.27
## [S] TDDS_5 0.9 0.91 0.93 0.32 9.5 0.0044 0.025 0.27
## [P] TDDS_6 0.9 0.90 0.93 0.32 9.5 0.0044 0.027 0.27
## [M] TDDS_7 0.9 0.90 0.93 0.31 9.0 0.0046 0.024 0.26
## [S] TDDS_8 0.9 0.90 0.93 0.32 9.3 0.0045 0.026 0.27
## [P] TDDS_9 0.9 0.90 0.93 0.32 9.3 0.0044 0.028 0.26
## [M] TDDS_10 0.9 0.90 0.93 0.31 9.1 0.0046 0.024 0.27
## [S] TDDS_11 0.9 0.90 0.93 0.31 9.2 0.0045 0.028 0.26
## [P] TDDS_12 0.9 0.90 0.93 0.32 9.5 0.0044 0.027 0.27
## [M] TDDS_13 0.9 0.90 0.93 0.31 9.1 0.0045 0.023 0.27
## [S] TDDS_14 0.9 0.90 0.93 0.31 9.2 0.0046 0.027 0.26
## [P] TDDS_15 0.9 0.90 0.93 0.31 9.1 0.0045 0.028 0.26
## [M] TDDS_16 0.9 0.90 0.93 0.31 9.1 0.0046 0.024 0.27
## [S] TDDS_17 0.9 0.90 0.93 0.32 9.2 0.0045 0.028 0.26
## [P] TDDS_18 0.9 0.90 0.93 0.32 9.3 0.0045 0.028 0.26
## [M] TDDS_19 0.9 0.90 0.93 0.31 9.0 0.0046 0.023 0.27
## [S] TDDS_20 0.9 0.90 0.93 0.32 9.4 0.0044 0.027 0.27
## [P] TDDS_21 0.9 0.90 0.93 0.32 9.3 0.0045 0.028 0.27
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## [M] TDDS_1 1022 0.61 0.61 0.59 0.56 3.5 1.8
## [S] TDDS_2 1022 0.63 0.62 0.60 0.57 3.8 1.9
## [P] TDDS_3 1022 0.47 0.50 0.46 0.42 5.5 1.5
## [M] TDDS_4 1022 0.63 0.63 0.63 0.58 5.0 1.9
## [S] TDDS_5 1022 0.50 0.48 0.46 0.43 2.5 1.8
## [P] TDDS_6 1022 0.48 0.50 0.46 0.42 4.1 1.7
## [M] TDDS_7 1022 0.69 0.68 0.68 0.64 4.0 1.9
## [S] TDDS_8 1022 0.59 0.57 0.56 0.53 2.8 2.0
## [P] TDDS_9 1022 0.53 0.55 0.52 0.48 4.1 1.5
## [M] TDDS_10 1022 0.64 0.64 0.63 0.58 4.8 1.8
## [S] TDDS_11 1022 0.63 0.62 0.59 0.57 3.9 1.9
## [P] TDDS_12 1022 0.47 0.49 0.46 0.41 4.7 1.7
## [M] TDDS_13 1022 0.64 0.63 0.63 0.58 4.5 1.9
## [S] TDDS_14 1022 0.64 0.62 0.60 0.58 3.5 2.1
## [P] TDDS_15 1022 0.62 0.64 0.63 0.58 4.8 1.5
## [M] TDDS_16 1022 0.65 0.64 0.63 0.59 4.2 1.8
## [S] TDDS_17 1022 0.62 0.60 0.58 0.56 4.6 2.1
## [P] TDDS_18 1022 0.56 0.58 0.55 0.51 4.8 1.7
## [M] TDDS_19 1022 0.66 0.66 0.66 0.62 4.4 1.8
## [S] TDDS_20 1022 0.56 0.55 0.52 0.49 3.6 2.2
## [P] TDDS_21 1022 0.55 0.57 0.54 0.50 5.0 1.6
##
## Non missing response frequency for each item
## 1 2 3 4 5 6 7 miss
## [M] TDDS_1 0.18 0.16 0.18 0.18 0.14 0.08 0.07 0
## [S] TDDS_2 0.14 0.16 0.15 0.17 0.16 0.12 0.10 0
## [P] TDDS_3 0.01 0.03 0.07 0.12 0.21 0.25 0.31 0
## [M] TDDS_4 0.08 0.05 0.07 0.12 0.17 0.26 0.25 0
## [S] TDDS_5 0.46 0.17 0.10 0.11 0.06 0.05 0.05 0
## [P] TDDS_6 0.07 0.13 0.16 0.19 0.23 0.14 0.09 0
## [M] TDDS_7 0.13 0.10 0.14 0.18 0.18 0.15 0.11 0
## [S] TDDS_8 0.39 0.19 0.12 0.10 0.06 0.05 0.09 0
## [P] TDDS_9 0.05 0.14 0.18 0.21 0.25 0.12 0.06 0
## [M] TDDS_10 0.08 0.05 0.11 0.15 0.21 0.23 0.17 0
## [S] TDDS_11 0.15 0.14 0.15 0.15 0.15 0.14 0.12 0
## [P] TDDS_12 0.04 0.09 0.12 0.16 0.23 0.18 0.17 0
## [M] TDDS_13 0.11 0.08 0.08 0.16 0.20 0.17 0.19 0
## [S] TDDS_14 0.25 0.17 0.13 0.13 0.11 0.10 0.12 0
## [P] TDDS_15 0.02 0.07 0.12 0.17 0.25 0.24 0.13 0
## [M] TDDS_16 0.11 0.10 0.12 0.19 0.21 0.16 0.12 0
## [S] TDDS_17 0.12 0.09 0.10 0.13 0.12 0.16 0.28 0
## [P] TDDS_18 0.04 0.09 0.11 0.15 0.24 0.18 0.20 0
## [M] TDDS_19 0.10 0.09 0.10 0.18 0.23 0.17 0.13 0
## [S] TDDS_20 0.28 0.13 0.12 0.12 0.10 0.08 0.17 0
## [P] TDDS_21 0.04 0.06 0.08 0.17 0.24 0.21 0.21 0
Also known as Test-Retest reliability. It is an estimation of reliability based on the correlation of two equivalent forms of tests. The correlation between the split-halves is a reasonable measure of the reliability of one half of the test. Of course, this depends on how you divide the observations or items. Here, we split the data into two about equal-sized random sets and calculate a ‘simple’ split-half reliability for each TDDS sub-scale.
Pathogen | Sexual | Moral |
---|---|---|
0.835 | 0.87 | 0.931 |
Guttman’s \(\lambda_4\) (Guttman 1945) is the greatest split half reliability estimate. That is to say, \(\lambda_4\) is a reliability coefficient just like Cronbach’s \(\alpha\), except that it conveys the the maximum split-half reliability.
Guttman’s \(\lambda_6\) considers the amount of variance in each item that can be accounted for the linear regression of all of the other items (the squared multiple correlation or smc), or more precisely, the variance of the errors.
Guttman’s \(\lambda_3\) is the same as Cronbach’s \(\alpha\) (Cronbach 1951) and may be computed by
\[ \lambda_3 = \alpha = \frac{K}{K-1} \cdot \left( 1-\frac{\sum_{i=1}^{K} \sigma^{2}_{Y_i}}{\sigma^{2}_{X}}\right) \]
where:
\(\qquad \sigma^{2}_{X}\) is the variance of the observed total test scores, and
\(\qquad \sigma^{2}_{Y_i}\) is the variance of component i for the current sample of persons.
The greatest lower bound solves the “educational testing problem” (Guttman 1945). That is: what is the reliability of a test? Although there are many estimates of a test reliability (Guttman, 1945), most underestimate the true reliability of a test. When the the greatest lower bound glb is known, then the true reliability is by definition in the interval [glb, 1].
Klaas Sijtsma (2008) wrote:
Results
Pathogen | Sexual | Moral | |
---|---|---|---|
Lambda 4 | 0.852 | 0.883 | 0.930 |
Lambda 3 (alpha) | 0.836 | 0.871 | 0.931 |
Lambda 6 (smc) | 0.824 | 0.864 | 0.926 |
glb | 0.865 | 0.901 | 0.950 |
df data | |
---|---|
Guttman’s Lambda 3 (Chronbach’s alpha) | 0.908 |
Guttman’s Lambda 6 (smc) | 0.935 |
Greatest lowest bound (glb) | 0.958 |
Omega Hierarchical | 0.588 |
Omega Hierarchical asymptotic | 0.627 |
Omega Total | 0.938 |
Maximum split half reliability (Guttman’s Lambda 4) | 0.950 |
Average split half reliability | 0.906 |
Minimum split half reliability (Revelle’s Beta) | 0.703 |
Bartlett’s test of sphericity examines if there is any indication that variables may not intercorrelate. It pitches an identity matrix (1’s in the principal diagonal and zero’s everywhere else) against the observed correlation matrix. TDDS is stastically different than an identity matrix.
chisq | p.value | df |
---|---|---|
11172.02 | 0 | 210 |
Kaiser-Meyer-Olkin (KMO) Test is a measure of how suited your data is for Factor Analysis. The MSA statistic is a measure of the proportion of variance among variables that might be common variance. TDDS shows high factorability, and so do items.
Both Barlett’s and KMO’s test are very liberal and easy to pass.
MSA | TDDS_1 | TDDS_2 | TDDS_3 | TDDS_4 | TDDS_5 | TDDS_6 | TDDS_7 |
---|---|---|---|---|---|---|---|
0.92 | 0.94 | 0.94 | 0.91 | 0.93 | 0.88 | 0.89 | 0.94 |
TDDS_8 | TDDS_9 | TDDS_10 | TDDS_11 | TDDS_12 | TDDS_13 |
---|---|---|---|---|---|
0.88 | 0.92 | 0.94 | 0.95 | 0.91 | 0.92 |
TDDS_14 | TDDS_15 | TDDS_16 | TDDS_17 | TDDS_18 | TDDS_19 | TDDS_20 | TDDS_21 |
---|---|---|---|---|---|---|---|
0.92 | 0.91 | 0.94 | 0.92 | 0.92 | 0.92 | 0.93 | 0.93 |
TDDS does not seem to be multivariate normal. This warrants caution, and the need for robust ML estimation against departures of Multivariate Normality.
Here, three approaches are explored.
Following Sijtsma (2008) article on glb above, Revelle & Zinbarg provided an answer to the same journal: Psychometrika. They argue that while Sijtsma (2008) indeed reviewed lower bound estimates of reliability, and concluded that the glb or “greatest lower bound” (Bentler & Woodward, 1980) is the best reliability estimate - in agreement with Jackson and Agunwamba (1977) and Woodhouse and Jackson (1977) - this conclusion is innapropriate for two reasons:
[1] Contrary to what the name implies, the glb is not the greatest lower bound estimate of reliability, but is somewhat less than another, easily calculated and understood estimate of reliability \(\omega_{total}\) or \(\omega_t\) of McDonald (1999). [2] Rather than just focusing on the greatest lower bounds as estimates of a reliability of a test, we should also be concerned with the percentage of the test that measures one construct. As has been discussed previously (Revelle, 1979; McDonald, 1999; Zinbarg et al., 2005), this may be estimated by finding \(\omega_{h}\), the general factor saturation of the test (McDonald, 1999; Zinbarg et al., 2005), or the worst split half reliability of a test (coefficient beta, or \(\beta\), of Revelle, 1979).
Revelle goes on on saying that there are four properties of a test that is important.
The authors suggest that rather than using the amount of explained common variance (ECV) - as suggested by Sijtsma (2008) - a more appropriate measure is to consider is an index of how much the test measures one common factor. Building on previous work (Zinbarg et al., 2005, 2006), which use higher factor analysis with a Schmid-Leiman transformation (Schmid & Leiman, 1957) when performing an EFA and subsequently estimating the general factor saturation (coefficient \(\omega_{h}\) of McDonald (1999).
Revelle also defends that one may also find useful the hierarchical cluster analysis of items to find the worst split half reliability (coefficient \(\beta\) of Revelle, 1979).
In order to find \(\omega\), a factor analysis is conducted followed by an oblique rotation and extraction of a general factor using the schmid-leiman transformation (Schmid & Leiman, 1957). The sum of the uniqueness is used to find \(\omega_{t}\) and the squared sum of the g loadings to find \(\omega_{h}\)).
For tests that are thought to have a higher order structure, I contend is the case for TDDS, measures based upon just the average inter-item correlation, \(\alpha\) or \(\lambda_{6}\), may not appropriate. Coefficients that reflect a higher order structure such as \(\omega_{h}\) and \(\omega_{t}\) are more appropriate.
@rodriguez2016evaluating wrote:
“The differences between coefficients alpha and omega are that: (a) omega always is based on the factor loadings of a specific model, whereas alpha, typically, is computed based on observed variances and covariances and (b) alpha assumes equal loadings (essential tau equivalence), whereas omega is more appropriate when loadings vary (congeneric).”
A hierarchical factor analytic model was fit - which assumes there is a general factor reflecting individual differences on the disgust domain. Another possibility would be to conceive TDDS as ensuing from a correlated simple structure models (or as three uni-dimensional models) forming TDDS. Here we explore this theoretical point of view while hoping insights provided by bi-factor & hierarchical models (which are somewhat equivalent in many respects) to be worthwhile.
What is a correlated simple structure model? Hally O’Connor Quinn (2014) wrote “Correlated simple structure models, also called independent clusters, perfect clusters, or correlated traits models, are multidimensional models in which the multiple dimensions (or latent variables) may be correlated and the items are permitted to load onto only one of the multiple dimensions.” [see work here, page 7-8]. This is also shown here: Yung, Y. F., Thissen, D., & McLeod, L. D. (1999). On the relationship between the higher-order factor model and the hierarchical factor model. Psychometrika, 64(2), 113-128.
General factor saturation of TDDS: \(\omega_{h} = .58\). This value means that 58% of the variance of unit-weighted total scores can be attributed to the individual differences on the general factor. It is the reliable variance attributed to a single general factor.
Importantly, the square root of \(\omega_{h}\) (= 0.76) is the correlation between the general factor and the observed total scores (test scores). To me, this info is quite relevant.
The ratio between \(\omega_{h}\) and \(\omega_{t}\) also provides relevant information. We see that \(ratio= \frac{\omega_{h}}{\omega_{t}}\) is 0.622, meaning that 62% of the reliable variance in total scores can be attributed to the general factor, assumed to reflect individual differences on the disgust domain. It was estimated that only 6% (1 -\(\omega_{t}\)) is estimated to be due to random error. We also find that \(\omega_{t} - \omega_{h} =\) 0.35, which means that 36% is the reliable variance in total scores can be attributed to the multidimensionality caused by the group factors.
Also reported in these analysis is the Explained common variance (ECV; Bentler, 2009; Reise, Moore, & Haviland, 2010; Sijtsma, 2009; ten Berge & Socan, 2004), which is an indicator of dimensionality. TDDS’s ECV is 0.38 and this indicates it is highly multi-dimensional (ECV around 1 is considered to be an indication of “strictly” uni-dimensional, 0.9 < ECV < 1 is considered to be “essentially” uni-dimensional, and below that, multi-dimensional). Thus, raw total scores can not be interpreted as an essentially uni-dimensional reflection of disgust due to the presence of clear multidimensionality of the data. The consequence for this is that in applying IRT, it would be recommended to model it with Multidimensional-IRT.
Interestingly, as we will see below (in IRT-based factor analysis and IRT further down), we also find that the worse performing items are indeed 2 and 11, whose variance are shared by 2 factors.
## Hierarchical Factor Analytical Model
## Call: psych::omega(m = dataS[, 1:21], nfactors = 3, title = "Hierarchical Factor Analytical Model",
## sl = FALSE)
## Alpha: 0.91
## G.6: 0.93
## Omega Hierarchical: 0.58
## Omega H asymptotic: 0.62
## Omega Total 0.94
##
## Schmid Leiman Factor loadings greater than 0.2
## g F1* F2* F3* h2 u2 p2
## [M] TDDS_1 0.40 0.56 0.49 0.51 0.34
## [S] TDDS_2 0.49 0.48 0.50 0.50 0.49
## [P] TDDS_3 0.43 0.42 0.36 0.64 0.51
## [M] TDDS_4 0.43 0.71 0.70 0.30 0.26
## [S] TDDS_5 0.36 0.63 0.53 0.47 0.25
## [P] TDDS_6 0.42 0.37 0.32 0.68 0.56
## [M] TDDS_7 0.46 0.68 0.68 0.32 0.30
## [S] TDDS_8 0.42 0.73 0.72 0.28 0.25
## [P] TDDS_9 0.47 0.39 0.39 0.61 0.58
## [M] TDDS_10 0.43 0.71 0.68 0.32 0.27
## [S] TDDS_11 0.49 0.44 0.47 0.53 0.52
## [P] TDDS_12 0.44 0.48 0.43 0.57 0.45
## [M] TDDS_13 0.42 0.75 0.74 0.26 0.23
## [S] TDDS_14 0.47 0.56 0.55 0.45 0.40
## [P] TDDS_15 0.58 0.53 0.63 0.37 0.55
## [M] TDDS_16 0.43 0.67 0.63 0.37 0.30
## [S] TDDS_17 0.45 0.37 0.37 0.63 0.55
## [P] TDDS_18 0.49 0.43 0.43 0.57 0.57
## [M] TDDS_19 0.44 0.75 0.75 0.25 0.26
## [S] TDDS_20 0.42 0.57 0.51 0.49 0.35
## [P] TDDS_21 0.50 0.46 0.46 0.54 0.54
##
## With eigenvalues of:
## g F1* F2* F3*
## 4.3 3.4 2.2 1.5
##
## general/max 1.27 max/min = 2.33
## mean percent general = 0.41 with sd = 0.13 and cv of 0.32
## Explained Common Variance of the general factor = 0.38
##
## The degrees of freedom are 150 and the fit is 0.73
## The number of observations was 1022 with Chi Square = 739.2 with prob < 1.2e-78
## The root mean square of the residuals is 0.03
## The df corrected root mean square of the residuals is 0.04
## RMSEA index = 0.062 and the 10 % confidence intervals are 0.058 0.067
## BIC = -300.23
##
## Compare this with the adequacy of just a general factor and no group factors
## The degrees of freedom for just the general factor are 189 and the fit is 6.14
## The number of observations was 1022 with Chi Square = 6218.82 with prob < 0
## The root mean square of the residuals is 0.2
## The df corrected root mean square of the residuals is 0.21
##
## RMSEA index = 0.178 and the 10 % confidence intervals are 0.173 0.181
## BIC = 4909.14
##
## Measures of factor score adequacy
## g F1* F2* F3*
## Correlation of scores with factors 0.78 0.89 0.84 0.70
## Multiple R square of scores with factors 0.61 0.79 0.71 0.50
## Minimum correlation of factor score estimates 0.21 0.59 0.42 -0.01
##
## Total, General and Subset omega for each subset
## g F1* F2* F3*
## Omega total for total scores and subscales 0.94 0.93 0.87 0.84
## Omega general for total scores and subscales 0.58 0.26 0.35 0.45
## Omega group for total scores and subscales 0.31 0.67 0.52 0.38
Below, the Bi-factor diagram is plotted. Note that the bi-factor is just a rotation/transformation of the hierarchical model.
Much of the below text is derived from the Personality Project R document overview.
An alternative estimate of factor saturation is the coefficient \(\beta\) by Revelle (1979), which uses hierarchical cluster analysis to find the two most unrelated split halves of the test and then uses the implied inter-group item correlation to estimate the total variance accounted for by a general factor.
An alternative to factor or components analysis is cluster analysis. The goal of cluster analysis is the same as factor or components analysis (reduce the complexity of the data and attempt to identify homogeneous sub-groupings). Mainly used for clustering people or objects (e.g., projectile points if an anthropologist, DNA if a biologist, galaxies if an astronomer), clustering may be used for clustering items or tests as well. Interestingly enough, it has has had limited applications to psychometrics. This is unfortunate, for as has been pointed out by e.g. (Tryon, 1935; Loevinger et al., 1953), the theory of factors, while mathematically compelling, offers little that the geneticist or behaviorist or perhaps even non-specialist finds compelling. Cooksey and Soutar (2006) reviews why the iclust algorithm is particularly appropriate for scale construction in marketing.
Hierarchical cluster analysis forms clusters that are nested within clusters. The resulting tree diagram shows the nesting structure. Although there are many hierarchical clustering algorithms in R (e.g., agnes, hclust, and iclust), the one most applicable to the problems of scale construction is iclust (Revelle, 1979). Here’s the procedure:
In other words, iclust forms clusters of items using a hierarchical clustering algorithm until one of two measures of internal consistency fails to increase (Revelle, 1979). The number of clusters may be specified a priori, or found empirically. The resulting statistics include the average split half reliability, \(\alpha\) (Cronbach, 1951), as well as the worst split half reliability, \(\beta\) (Revelle, 1979), which is an estimate of the general factor saturation of the resulting scale. Cluster loadings (corresponding to the structure matrix of factor analysis) are reported below.
We performed an Item Cluster Analysis (iclust) and it naturally (empirically) formed the theorized clusters “moral”, “sexual”, “pathogen”. This is a result in itself. It shows that all items of each sub-scale are empirically estimated to belong (increase internal consistency) to their respective sub-construct.
Also, this model’s statistical omnibus quantities such as \(\alpha\), \(\omega\), \(\beta\), and ECV are the same (literally) as in the above output for hierarchical (and bi-factor) model. The difference is that a clustering algorithm is applied instead, and allows us to visualize the cluster structure of 21 disgust items.
## ICLUST (Item Cluster Analysis)Call: iclust(r.mat = r.mat, nclusters = nclusters, alpha = alpha, beta = beta,
## beta.size = beta.size, alpha.size = alpha.size, correct = correct,
## correct.cluster = correct.cluster, reverse = reverse, beta.min = beta.min,
## output = output, digits = digits, labels = labels, cut = cut,
## n.iterations = n.iterations, title = title, plot = plot,
## weighted = weighted, cor.gen = cor.gen, SMC = SMC, purify = purify,
## diagonal = diagonal)
## Item Cluster Analysis
##
## Purified Alpha:
## C17 C16 C18
## 0.93 0.87 0.84
##
## Guttman Lambda6*
## C17 C16 C18
## 0.93 0.87 0.83
##
## Original Beta:
## C17 C16 C18
## 0.81 0.82 0.76
##
## Cluster size:
## C17 C16 C18
## 7 7 7
##
## Purified scale intercorrelations
## reliabilities on diagonal
## correlations corrected for attenuation above diagonal:
## C17 C16 C18
## C17 0.93 0.38 0.41
## C16 0.34 0.87 0.55
## C18 0.36 0.47 0.84
In this section, we explore the dimensionality of TDDS. The goal here is to investigate and assess the plausibility of its hypothesized structural complexity.
In the literature, there have been many proposal to assess dimensionality and for estimating a general factor:
While we will go through these, and interpret the insights these provide, I contend that dimensionality can be best assessed by Exploratory Graphical Analysis, which unites machine learning and latent variable modeling.
The two most commonly employed methods are the scree plot and parallel analysis. We first try the simplest technique, which is the Cattell’s scree plot. Since eigenvalues are a measure of the amount of variance accounted for by a factor, they can be useful in determining the number of factors that we need to extract. In other words, one way to determine the number of factors or components in a data matrix or a correlation matrix is to examine the “scree” plot of the successive eigenvalues. In a scree plot, we simply plot the eigenvalues for all of our factors, and then look to see where they “drop off sharply” (or an elbow) indicating the inclusion of another factor does not add much variance explained.
Parallel analysis is an alternative technique that compares the scree of factors of the observed data with that of a random data matrix of the same size as the original (i.e., it simulates random data same size as the empirical data-set) and compares both. As before, the output is similar of the Cattel’s scree test. The text below the plot suggests 4 factors. Mainly because it is using the sharp angle (rather than using also the eigenvalue < 1 criteria).
Most users of factor analysis tend to interpret factor output by focusing their attention on the largest loadings for every variable and ignoring the smaller ones. VSS operationalizes this tendency by comparing the original correlation matrix to that reproduced by a simplified version of the original factor matrix. The VSS criterion compares the fit of the simplified model to the original correlations. VSS applies a goodness of fit test to determine the optimal number of factors to extract. It can be thought of as a quasi-confirmatory model, in that it fits the very simple structure of a factor pattern matrix to the original correlation matrix. For items where the model is usually of complexity one, this is equivalent to making all except the largest loading for each item 0. Typically, this is often the solution the user wants to interpret. The analysis also includes the MAP criterion and a \(\chi^2\) estimate.
Both orthogonal and oblique transformations of the solution (rotations) were used, all of which accounting for polychoric correlations. The results are comparable so I report only one. Figure 11.A show the Scree plot of Principal Factors and Figure 11.B compares the fit as a function of the number of factors in the data. It shows a small increment from 2 to 3 (although meaningful), and from 3 to 4 factors not anymore (comparing the ‘factor model’ to the original correlation matrix). The best way to see this is via the statistical summaries, rather than via Figure 11.B. The numerical summaries indicate that step 2 to 3 factors increase fit in a variety of criteria, RMSEA, SABIC and SRMR. This, VSS estimates the dimensionality of TDDS to be 3.
##
## Very Simple Structure of Figure 13.B Assessing an appropriate number of factors
## Call: vss(x = x, n = n, rotate = rotate, diagonal = diagonal, fm = fm,
## n.obs = n.obs, plot = plot, title = title, use = use, cor = cor)
## VSS complexity 1 achieves a maximimum of 0.77 with 1 factors
## VSS complexity 2 achieves a maximimum of 0.9 with 3 factors
##
## The Velicer MAP achieves a minimum of 0.02 with 3 factors
## BIC achieves a minimum of -240.37 with 5 factors
## Sample Size adjusted BIC achieves a minimum of 124.88 with 5 factors
##
## Statistics by number of factors
## vss1 vss2 map dof chisq prob sqresid fit RMSEA BIC SABIC complex
## 1 0.77 0.00 0.075 189 6882 0.0e+00 19.7 0.77 0.187 5573 6173 1.0
## 2 0.73 0.89 0.033 169 2924 0.0e+00 9.2 0.89 0.127 1753 2290 1.6
## 3 0.70 0.90 0.015 150 994 1.6e-124 4.3 0.95 0.075 -45 431 2.0
## 4 0.70 0.89 0.017 132 693 5.0e-77 3.6 0.96 0.065 -221 198 2.2
## 5 0.70 0.90 0.022 115 557 4.3e-59 3.2 0.96 0.062 -240 125 2.3
## eChisq SRMR eCRMS eBIC
## 1 10970 0.160 0.169 9661
## 2 3339 0.088 0.098 2168
## 3 424 0.031 0.037 -615
## 4 209 0.022 0.028 -706
## 5 150 0.019 0.025 -647
To complement the above analysis, below you find the output of a manual and simultaneous dimensional analysis which take into account Eigen values, Parallel analysis, Optimal coordinates and Accelerated factor by applying a specral decomposition on the polychoric-based correlation matrix of TDDS.
Comparison Data
@ruscio2012determining has proposed a method to determine the optimal factor solution by comparison data (CD). Rather than generating random data, like parallel analysis does, this method use a simulation technique that reproduces the research data varying the factor structure, so it compares the correlation matrix of your data with that from simulated 1-factor-data, 2-factor-data , 3-factors-data and etc. As shown in Figure 14, and other analyses, results indicate 3 factors or 7 factors to retain - depending on the wanted complexity. This is a good result because not only it confirms previous analysis, but also suggests that based on correlations the 3 dimension solution is indeed the most likely case.
## Number of factors to retain: 7
A novel technique that to assesses dimensionality by using network modeling with penalized maximum likelihood estimation (i.e., LASSO: least absolute shrinkage and selection operator by Tisbhirani, 1996) and bootstrapping.
The text below is a heavily copied text from Hudson’s Golino GitHub page and his and Sasha’s article introducing the EGA technique.
Exploratory Graph Analysis is part of a new area called network psychometrics (see Epskamp, Maris, Waldorp & Borsboom, in press), that focuses on the estimation of undirected network models (i.e. Markov Random Fields; Lauritsen, 1996) to psychological data-sets. This area has been applied in different areas of psychology, from psychopathology (e.g., Borsboom et al, 2011; Borsboom & Cramer, 2013; Fried et al., 2015) to developmental psychology (van der Maas, 2006), passing through quality of life research (Kossakowski et al., 2015). In network psychometrics, variables are represented as nodes and their relations as edges. When analyzing data from psychological instruments, one may be interest in observing if many nodes are connected with each other, forming clusters, as it is argued clusters of nodes may emerge due to underlying latent variables. If a latent variable model is the true underlying causal model, we would expect indicators in a network model to form strongly connected clusters for each latent variable. Network models can be shown to be mathematically equivalent under certain conditions to latent variable models in both binary (Epskamp, Maris, Waldorp & Borsboom, in press) and Gaussian data-sets (Chandrasekaran, Parrilo & Willsky, 2010).
By defining a cluster as a group of connected nodes regardless of edge weight, Golino and Epskamp (2016) pointed to a fundamental rule of network psychometrics: clusters in network equals latent variables. This is not only a philosophical interpretation of networks, nor solely an empirical finding (Cramer et al., 2010; Cramer et al., 2012; Costantini et al., 2014; Borsboom et al., 2011; Epskamp et al., 2012; van der Maas et al., 2006), but a mathematical characteristic of networks (Golino & Epskamp, 2016). Golino & Epskamp (2016) showed that:
Estimating network models can be done by using penalized maximum likelihood estimation, such as the least absolute shrinkage and selection operator (LASSO; Tisbhirani, 1996), one of the most used methods for network estimation on psychological data-sets (van Borkulo et al., 2014; Kossakowski et al., 2015; Fried et al., 2015). The LASSO technique avoid over-fitting and may result in many parameters to be estimated to exactly equal zero. This indicates conditional independence and facilitates the interpretability of the network structure.
The Exploratory Graph Analysis works as follows. Firstly it estimates the correlation matrix of the observable variables, then proceeds to use the graphical LASSO estimation to obtain the sparse inverse covariance matrix, with the regularization parameter defined via EBIC over 100 different values. In the last step the walktrap algorithm (Pons & Latapy, 2004) is used to find the number of clusters of the partial correlation matrix. In EGA, the number of clusters identified equals the number of latent factors in a given data-set.
Estimating the correct number of dimensions in psychological and educational instruments is a long-standing problem in psychometrics (Preacher & MacCallum, 2003; Ruscio & Roche, 2012; Velicer & Jackson, 1990). There are many techniques and methods to accomplish this task, such as Horn’s (PA; 1965) parallel analysis, the Kaiser-Guttman eigenvalue greater than-one-rule (Guttman, 1954; Kaiser, 1960), Velicer’s minimum average partial procedure (MAP; 1976), the very simple structure (VSS; Revelle & Rocklin, 1979), the use of the Bayesian information criterion (BIC; Schwarz, 1978) or the extended Bayesian information criterion (EBIC; Chen & Chen, 2008) to investigate the best fit over a series of structural models with different number of factors, among other techniques. There are several studies investigating the performance of parallel analysis (Buja & Eyubuglu, 1992; Crawford et al., 2010; Green, Redell, Thompson, & Levy, 2016; Keith, Caemmerer, & Reynolds, 2016; Timmerman & Lorenzo-Seva, 2011; Velicer, Eaton, & Fava, 2000, Velicer, 1986; Zwick & Velicer, 1986), multiple average partial procedure (Garrido, Abad, & Ponsoda, 2011; Keith, Caemmerer, & Reynolds, 2016; Velicer, Eaton, & Fava, 2000), BIC (Dziak, Coffman, Lanza, & Li, 2012; Lopes & West, 2004; Preacher, Zhang, Kim, & Mels, 2013; Song & Belin, 2008) and the Kaiser-Guttman eigenvalue rule (Hakstian, Rogers, & Catell, 1982; Keith, Caemmerer, & Reynolds, 2016; Ruscio & Roche, 2012; Velicer et al., 2000; Zwick & Velicer, 1982, 1986) in estimating the correct number of factors. In sum, PA and MAP work quite well when there is a low or moderate correlation between factors, when the sample size is equal to or greater than 500 and when the factor loadings are from moderate to high (Crawford et al., 2010; Green, et al., 2011; Keith, Caemmerer, & Reynolds, 2016; Zwick & Velicer, 1986). However, they tend to underestimate the number of factors when the correlation between factors are high, when the sample size is small and when there is small number of indicators per factor (Crawford et al., 2010; Green, et al., 2011; Keith, Caemmerer, & Reynolds, 2016; Ruscio and Roche, 2012). Regarding BIC, Preacher, Zhang, Kim and Mels (2013) showed that it performs well when the sample size is small, but it tends to overestimate the number of factors in large data-sets. Dziak, Coffman, Lanza and Li (2012), by the other side, showed that BIC increases its accuracy in estimating the number of factors when the sample size is greater than 200 cases. The Kaiser-Guttman rule is the default method for choosing the number of factors in many commercial software (Bandalos & Boehm-Kaufman, 2009), but simulation studies show that it overestimates the number of factors, especially with a large number of items and a large sample size (Hakstian, Rogers, & Catell, 1982; Keith, Caemmerer, & Reynolds, 2016; Ruscio & Roche, 2012; Velicer et al., 2000; Zwick & Velicer, 1982, 1986). Ruscio and Roche (2012) provided a startling evidence in this direction, since the Kaiser-Gutman rule overestimated the number of factors in 89.87% of the 10,000 simulated data-sets, generated with different number of factors, sample size, number of items, number of response categories per item and correlation between factors. In face of the evidences from the simulation studies, some researchers strongly recommend not to use this method (Bandalos, & Boehm-Kaufman, 2009; Velicer et al., 2000).
These simulation studies highlight a very complicated problem within psychology, since it is very common to find areas in which the correlation between factors is high, the sample size is up to 500 cases, especially in the intelligence field (Keith, Caemmerer, & Reynolds, 2016). Thus, in such situations, neither PA, MAP or comparing different number of factors via BIC can lead to the correct number of dimensions, proving that estimating the number of factors is still a non-trivial task, in spite of the past decades’ developments. Recently a new technique termed Exploratory Graph Analysis (EGA) was proposed by Golino and Epskamp (2016), that seems to overcome the limitations presented by the traditional methods pointed above.
All the info about EGA can be found in this link
We see that EGA has converged into a three cluster solution, with all the correct items in its respective clusters. We see by their proximity how the TDDS items are correlated visually (Network). We can also graphically see how strongly items are connected to each other by the thickness of the their lines. These could be interesting in the sense that it may hint that strongly connected items share a theoretical sub-component of a given sub-scale. For example, would you say that from a substantive point of view, P_i9 and P_15 tap into the same idea? How about P_i2 and P_i6? And so on.
We have also investigates the stability of EGA’s estimation via bootstrap. We estimates the number of dimensions of n=100 bootstraps from the empirical correlation matrix, and we present the results below, along with a graphical representation of the typical network (i.e. the network formed by the median pairwise partial correlations over the n bootstraps) and its dimensionality.
Lastly, we further verified the fit of the structure suggested by EGA using confirmatory factor analysis (estimator WLSMV).
Bootstrap solution of EGA
N. Bootstraps | Median Dim | SD Dim | SE Dim | CI Dim | Lower | Upper |
---|---|---|---|---|---|---|
100 | 3 | 0 | 0 | 0 | 3 | 3 |
Dimensionality | Frequency |
---|---|
3 | 100 |
CFA of EGA solution
## [1] S_i2 S_i5 S_i8 S_i11 S_i14 S_i17 S_i20
## 21 Levels: M_i1 M_i10 M_i13 M_i16 M_i19 M_i4 M_i7 P_i12 P_i15 ... S_i8
## [1] P_i3 P_i6 P_i9 P_i12 P_i15 P_i18 P_i21
## 21 Levels: M_i1 M_i10 M_i13 M_i16 M_i19 M_i4 M_i7 P_i12 P_i15 ... S_i8
## [1] M_i1 M_i4 M_i7 M_i10 M_i13 M_i16 M_i19
## 21 Levels: M_i1 M_i10 M_i13 M_i16 M_i19 M_i4 M_i7 P_i12 P_i15 ... S_i8
chisq | df | pvalue | cfi | rmsea | gfi | nfi |
---|---|---|---|---|---|---|
573.56 | 186 | 0 | 0.98 | 0.05 | 1 | 0.98 |
The analysis above suggests that there are likely 3 factors underlying TDDS, and that we should extract in our analysis. Now, we will use regular/exploratory FA and plot the possible solutions. In a very broad sense, “common factor” analysis is “principal axis factoring” and it is used broadly to identify the latent variables that are underlying a set of variables. In other words, it evaluates a theoretical model with a set of variables, while principal components analysis is used for data reduction.
The factor analysis model is written as series of equations of the form:
\[y_{pi} = \lambda_{pq} f_{qi} + u_{pi}\] where \(y_{pi}\) is individual i’s score on the p-th observed variable (indicator/item), \(f_{qi}\) is individual i’s score on the qth latent common factor, \(u_{pi}\) is individual i’s score on the p-th latent unique factor, and \(\lambda_{pq}\) is the factor loading that indicates the relation between the p-th observed variable and the q-th latent common factor.
Typically, in a EFA there are multiple observed variables and more than one common factor. For instance, in a 6 variable 3 factor case, the model would be
\[y_{1i} = \lambda_{11} f_{1i} + \lambda_{12} f_{2i} +\lambda_{13} f_{3i} + u_{1i}\] \[y_{2i} = \lambda_{21} f_{1i} + \lambda_{22} f_{2i} +\lambda_{23} f_{3i} + u_{2i}\] \[y_{3i} = \lambda_{31} f_{1i} + \lambda_{32} f_{2i} +\lambda_{33} f_{3i} + u_{3i}\] \[y_{4i} = \lambda_{41} f_{1i} + \lambda_{42} f_{2i} +\lambda_{43} f_{3i}+ u_{4i}\] \[y_{5i} = \lambda_{51} f_{1i} + \lambda_{52} f_{2i} +\lambda_{53} f_{3i} + u_{5i}\] \[y_{6i} = \lambda_{61} f_{1i} + \lambda_{62} f_{2i} +\lambda_{63} f_{3i}+ u_{6i}\]
which is usually represented in a Matrix form:
\[ \boldsymbol{Y_{i}} = \boldsymbol{\Lambda}\boldsymbol{F_{i}} + \boldsymbol{U_{i}} \] where \(\boldsymbol{Y_{i}}\) is a \(p\) x 1 vector of observed variable scores, \(\boldsymbol{\Lambda}\) is a p x q matrix of factor loadings, \(\boldsymbol{F_{i}}\) is a \(q\) x 1 vector of common factor scores, and \(\boldsymbol{U_{i}}\) is a p x 1 vector of unique factor scores. Extension to multiple persons provided for mapping to the observed correlation matrix, \(\boldsymbol{\Sigma} = \boldsymbol{Y}'\boldsymbol{Y}\) and the common factor model becomes
\[ \boldsymbol{\Sigma} = \boldsymbol{\Lambda}\boldsymbol{\Psi}\boldsymbol{\Lambda}' + \boldsymbol{\Theta} \] where \(\boldsymbol{\Sigma}\) is a p x p covariance (or correlation) matrix of the observed variables, \(\boldsymbol{\Lambda}\) is a p x q matrix of factor loadings, \(\boldsymbol{\Psi}\) is a q x q covariance matrix of the latent factor variables, and \(\boldsymbol{\Theta}\) is a diagonal matrix of unique factor variances.
Below you find “a very good” EFA solution as all items load in their respective theorized factors, no items loading negatively on any factor, and no indicator with too low loading on the factor (< 0.3). We use both oblimin (oblique) rotation and varimax (orthogonal). The former allows factors to be correlated, while the latter specifies that factors should not be allowed to correlate. Fit indices are also displayed.
RMSEA | TLI | RMS | CFI |
---|---|---|---|
0.08 | 0.88 | 0.16 | 0.95 |
The factor loading matrix displays the the factor loadings for each variable, using 0.3 as threshold, after rotation has been applied. The solutions using the threshold 0.3 are equal for both orthogonal and oblique solutions, but it is sometimes informative to show the factor loadings of the orthogonal solution, so one can calculate the item’s commonality (which is the sum of squared loadings). Immediately below you find the factor loading for the oblique solution (that go together with the graph above). And further down those of the orthogonal solution, with the factor loadings without thresholds.
F1.loadings | F2.loadings | F3.loadings | communality | uniquenesses | complexity | |
---|---|---|---|---|---|---|
[M] TDDS_1 | 0.65 | 0.49 | 0.51 | 1.07 | ||
[S] TDDS_2 | 0.57 | 0.50 | 0.50 | 1.33 | ||
[P] TDDS_3 | 0.6 | 0.36 | 0.64 | 1.01 | ||
[M] TDDS_4 | 0.83 | 0.70 | 0.30 | 1.04 | ||
[S] TDDS_5 | 0.75 | 0.53 | 0.47 | 1.02 | ||
[P] TDDS_6 | 0.54 | 0.32 | 0.68 | 1.07 | ||
[M] TDDS_7 | 0.79 | 0.68 | 0.32 | 1.07 | ||
[S] TDDS_8 | 0.88 | 0.72 | 0.28 | 1.04 | ||
[P] TDDS_9 | 0.57 | 0.39 | 0.61 | 1.10 | ||
[M] TDDS_10 | 0.83 | 0.68 | 0.32 | 1.01 | ||
[S] TDDS_11 | 0.53 | 0.47 | 0.53 | 1.48 | ||
[P] TDDS_12 | 0.69 | 0.43 | 0.57 | 1.04 | ||
[M] TDDS_13 | 0.88 | 0.74 | 0.26 | 1.01 | ||
[S] TDDS_14 | 0.68 | 0.55 | 0.45 | 1.06 | ||
[P] TDDS_15 | 0.77 | 0.63 | 0.37 | 1.01 | ||
[M] TDDS_16 | 0.78 | 0.63 | 0.37 | 1.00 | ||
[S] TDDS_17 | 0.44 | 0.37 | 0.63 | 1.59 | ||
[P] TDDS_18 | 0.62 | 0.43 | 0.57 | 1.03 | ||
[M] TDDS_19 | 0.87 | 0.75 | 0.25 | 1.00 | ||
[S] TDDS_20 | 0.69 | 0.51 | 0.49 | 1.05 | ||
[P] TDDS_21 | 0.67 | 0.46 | 0.54 | 1.01 |
F1.loadings | F2.loadings | F3.loadings | communality | uniquenesses | complexity | |
---|---|---|---|---|---|---|
[M] TDDS_1 | 0.65 | 0.49 | 0.51 | 1.29 | ||
[S] TDDS_2 | 0.61 | 0.33 | 0.50 | 0.50 | 1.63 | |
[P] TDDS_3 | 0.58 | 0.36 | 0.64 | 1.16 | ||
[M] TDDS_4 | 0.81 | 0.70 | 0.30 | 1.12 | ||
[S] TDDS_5 | 0.72 | 0.53 | 0.47 | 1.03 | ||
[P] TDDS_6 | 0.52 | 0.32 | 0.68 | 1.36 | ||
[M] TDDS_7 | 0.78 | 0.68 | 0.32 | 1.23 | ||
[S] TDDS_8 | 0.84 | 0.72 | 0.28 | 1.06 | ||
[P] TDDS_9 | 0.56 | 0.39 | 0.61 | 1.44 | ||
[M] TDDS_10 | 0.81 | 0.68 | 0.32 | 1.10 | ||
[S] TDDS_11 | 0.58 | 0.35 | 0.47 | 0.53 | 1.77 | |
[P] TDDS_12 | 0.65 | 0.43 | 0.57 | 1.07 | ||
[M] TDDS_13 | 0.85 | 0.74 | 0.26 | 1.06 | ||
[S] TDDS_14 | 0.69 | 0.55 | 0.45 | 1.32 | ||
[P] TDDS_15 | 0.75 | 0.63 | 0.37 | 1.24 | ||
[M] TDDS_16 | 0.77 | 0.63 | 0.37 | 1.15 | ||
[S] TDDS_17 | 0.49 | 0.37 | 0.63 | 2.09 | ||
[P] TDDS_18 | 0.61 | 0.43 | 0.57 | 1.34 | ||
[M] TDDS_19 | 0.85 | 0.75 | 0.25 | 1.09 | ||
[S] TDDS_20 | 0.68 | 0.51 | 0.49 | 1.18 | ||
[P] TDDS_21 | 0.65 | 0.46 | 0.54 | 1.23 |
We can use the eigenvalues to calculate the percentage of variance accounted for by each of the factors. Given that the maximum sum of the eigenvalues will always be equal to the total number of variables in the analysis (in percentage), we can calculate the percentage of variance accounted for by dividing each eigenvalue by the total number of variables in the analysis. The percentage of variance accounted are 33.2651457, 13.168512, 7.6258499 for both the oblimin and the orthogonal solution.
Now we look at the the percentage of variance that can be explained by the retained factors. These are merely the sum of squared factor loadings for that item. The larger the number, the better. However, note that the index of commonality is only informative contingent upon the item not loading highly on more than one factor. See for example item 2 and 11, which have a higher than usual loading on more than one factor, and because of this, have a seemingly high commonality.
We can also do an IRT-based factor analysis. The _irt.fa _function calculates the polychoric matrix based on the original data and then use it to perform the factor analysis. The IRT analysis here uses only two parameters, difficulty and discrimination. The function reports parameters in normal (as in Gaussian) units, which is to say that if one needs to convert them to conventional IRT parameters, one needs to multiply by 1.702. In addition, the location parameter is expressed in terms of difficulty (high positive scores imply lower frequency of response). Here, while using IRT Factor Analysis, or Item Factor Analysis (IFA), we find that each item loads on its hypothesized factor (with exception of item 11, which loads on both Sex and Pathogen sub-scale). We also have our first look at the information yielded by each item.
Importantly, the presented results in this sub-section will naturally differ from the IRT analysis of its IRT-specific section because there we will use a more appropriate IRT model - graded response model. Here, based on the parameters the output yields, we implement a 2-parameter IRT model which accounts only for the difficulty and discrimination of each item. I asked the package author (Revelle) for a definite answer on the mathematical model used and I got the following answer:
Basically, I am just taking advantage of Rod McDonald’s observation that a 2PL model based upon tetrachoric/polychoric correlations is directly rescaleable as an IRT model. See the nice paper by Kamata and Bauer (SEM 2008) discussing the equivalance. While this has been shown to work for dichotomous items, when I apply the same approach to polytomous items, resullts yield, functionally, a GRM. When I find scores using scoreIRT, I use a GRM approach with the loadings and item difficulties. Whether all of this is appropriate or not is an interesting question. My scores match those of MIRT with correlations > .99 so I am reasonably confident that this works. I mainly implement irt.fa as a fast way to give irt like parameters, particularly the information curves, that people find useful.
The first results we find relate to Test information, which indicates the accuracy with which the test can measure the latent construct, contingent upon the levels of the latent construct. It is an extension of reliability, which offers only a numerical summary (often alpha). In IRT, instead of estimating one number for the whole scale (omnibus statistic), IRT provides a function of ‘reliability’ (called information) across the levels of the latent construct.
As far as each factor is concerned, it seems both the Pathogen sub-scale seems to have less information (ability to accurately predict scores on individual items contingent upon individual traits or mean/sum scores). To see this, notice the y-axis goes from 0 to 6, while for sex where other the Moral sub-scale goes to 12 and Sex to 8. We also see that the “Test information curves” (TICs) are in different parts of the latent construct. The Moral sub-scale, for example, while very informative, is able be “more” informative (or be better at reliably predicting individual’s scores based on the person’s latent trait) on the mid-lower half of its scores. That is to say, it is more accurate measuring the those lower on moral disgust than those highly sensitive to moral disgust. Put differently, the sub-scale Moral does a better job at discriminating those low in the measured latent construct than those who are high. The same pattern (to a lesser extent) is observed for the Pathogen sub-scale, except that it is much less informative than the Moral sub-scale. In the same way, the Sex sub-scale displays a capacity to measure “more” accurately those who are high in the latent construct than those that are low.
To clarify the above analysis, we look at the decomposition of the Test information function (TIC). This is done by plotting the item information curves (IIC) which show how accurately each item is able to measure its intended latent construct. For the Moral sub-scale, we see that all items have an adequate bearing in measuring the construct, but that item 13 and 19 are the best at measuring the low-end and the high end. So, perhaps items dedicated to measure the high end of the scale might add information to its sub-scale. When looking at the Sex sub-scale, we see that item 8 and 5 are better in measuring the high-end of the construct, while no item covers the low-end of the scale. According to these results, this sub-scale would benefit from items aiming at measuring/discriminating among those low in disgust related to Sexual Disgust. We also see that items 2, 11, 17 are non-informative items, which is to say that people choosing a given category in these items, say extremely disgusted, are not reflective of this person’s total score. So, these items contribute to noise (or measurement error) rather than information. These items should be substituted by improved items based on theoretical considerations. Lastly, as for the Pathogen scale, the general level of information that each item achieves should be improved (see y-axis). That being said, item 6 and 9, are the least informative.
Lastly, we look at the item characteristics. That is to say, we decompose each item information curve in its “item characteristics curves” which are representative of each items’ response categories (from “Not Disgusted”, “Slightly Disgusted”, “Somewhat Disgusted”, “Moderately Disgusted”, “Disgusted”, “Very Disgusted”, “Extremely Disgusted”). This is say, for each item, you find 7 curves. Please zoom in the graphs, which should keep the resolution, but, in any case, these are plotted in a larger canvas below.
For the Moral sub-scale what we see is that each response category is more probable at a given range of the latent construct than the next response category curve. the ICCs of items 13 and 19 are representative of this ‘ideal’ scenario, which is why they have the highest information of the test.
Item 2, 6, 11, 14, 17, 18 show a pattern in which only the extreme categories are informative. That is to say, if you are average-to-high in the measured construct you will probably endorse the “Extremely Disgusted” response category. Alternatively, if you are low-to-average, you are likely to endorse the other extreme response category “Not Disgusted”. Basically, those items reflect ‘binary’ choices rather than graduated or ordinal ones to people that took the survey. [Post-script note: this is confirmed to be the case for all Disgust Sensitivity instruments, particularly DS-R and Germ Aversion of PVD]
Now, we perform a CFA.
Below you find a graphical representation of the standardized parameter estimates (i.e., Std.all), which is followed by a summary of its parameters and fit.
sex | moral | pathogen | |
---|---|---|---|
sex | 1.784 | 0.592 | 0.591 |
moral | 0.592 | 1.553 | 0.418 |
pathogen | 0.591 | 0.418 | 0.703 |
It is thought that the widespread use of CFA has to do with the ability to assess construct validity - the extent to which a set of measurement items relfect the theoretical lantent construct which designed to measure.
Convergent validity: indicators of a specific construct should converge or share a high proportion of variance in common.
Discrimant Validity : the extent to which a construct is truly distict from other constructs
Face Validity : assessment of the correspondence of the variable to be included in a summated scale and its conceptual definition
Nomological validity : examining whether the correlations among the constructs in a measurement theory make sense.
## lavaan (0.5-23.1097) converged normally after 44 iterations
##
## Number of observations 1022
##
## Estimator ML
## Minimum Function Test Statistic 1149.665
## Degrees of freedom 186
## P-value (Chi-square) 0.000
##
## Model test baseline model:
##
## Minimum Function Test Statistic 11718.162
## Degrees of freedom 210
## P-value 0.000
##
## User model versus baseline model:
##
## Comparative Fit Index (CFI) 0.916
## Tucker-Lewis Index (TLI) 0.905
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -37866.858
## Loglikelihood unrestricted model (H1) -37292.026
##
## Number of free parameters 45
## Akaike (AIC) 75823.716
## Bayesian (BIC) 76045.545
## Sample-size adjusted Bayesian (BIC) 75902.620
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.071
## 90 Percent Confidence Interval 0.067 0.075
## P-value RMSEA <= 0.05 0.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.058
##
## Parameter Estimates:
##
## Information Expected
## Standard Errors Standard
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## sex =~
## s1 1.000 1.336 0.704
## s2 0.935 0.046 20.192 0.000 1.248 0.687
## s3 1.166 0.051 22.974 0.000 1.557 0.790
## s4 0.988 0.049 19.979 0.000 1.319 0.679
## s5 1.152 0.053 21.776 0.000 1.538 0.745
## s6 0.953 0.053 17.914 0.000 1.273 0.606
## s7 1.171 0.056 20.735 0.000 1.564 0.707
## moral =~
## m1 1.000 1.246 0.690
## m2 1.229 0.050 24.500 0.000 1.531 0.823
## m3 1.223 0.050 24.280 0.000 1.525 0.815
## m4 1.171 0.048 24.536 0.000 1.459 0.825
## m5 1.339 0.052 25.576 0.000 1.669 0.863
## m6 1.177 0.049 23.818 0.000 1.467 0.798
## m7 1.268 0.049 25.784 0.000 1.580 0.871
## pathogen =~
## p1 1.000 0.839 0.578
## p2 1.183 0.080 14.843 0.000 0.992 0.590
## p3 1.179 0.075 15.703 0.000 0.988 0.639
## p4 1.261 0.082 15.443 0.000 1.058 0.624
## p5 1.435 0.080 17.926 0.000 1.204 0.794
## p6 1.323 0.084 15.745 0.000 1.109 0.641
## p7 1.336 0.081 16.449 0.000 1.121 0.685
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## sex ~~
## moral 0.592 0.066 8.957 0.000 0.356 0.356
## pathogen 0.591 0.056 10.634 0.000 0.527 0.527
## moral ~~
## pathogen 0.418 0.046 9.174 0.000 0.400 0.400
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .s1 1.811 0.092 19.640 0.000 1.811 0.504
## .s2 1.742 0.087 19.915 0.000 1.742 0.528
## .s3 1.456 0.083 17.572 0.000 1.456 0.375
## .s4 2.030 0.101 20.026 0.000 2.030 0.538
## .s5 1.898 0.101 18.840 0.000 1.898 0.445
## .s6 2.788 0.134 20.864 0.000 2.788 0.633
## .s7 2.452 0.125 19.600 0.000 2.452 0.501
## .m1 1.710 0.081 21.241 0.000 1.710 0.524
## .m2 1.116 0.057 19.423 0.000 1.116 0.322
## .m3 1.174 0.060 19.610 0.000 1.174 0.336
## .m4 1.002 0.052 19.391 0.000 1.002 0.320
## .m5 0.954 0.052 18.167 0.000 0.954 0.255
## .m6 1.224 0.061 19.951 0.000 1.224 0.363
## .m7 0.795 0.045 17.830 0.000 0.795 0.241
## .p1 1.403 0.068 20.725 0.000 1.403 0.666
## .p2 1.847 0.090 20.606 0.000 1.847 0.652
## .p3 1.416 0.071 20.008 0.000 1.416 0.592
## .p4 1.758 0.087 20.212 0.000 1.758 0.611
## .p5 0.849 0.053 16.152 0.000 0.849 0.369
## .p6 1.759 0.088 19.973 0.000 1.759 0.589
## .p7 1.417 0.074 19.257 0.000 1.417 0.530
## sex 1.784 0.145 12.321 0.000 1.000 1.000
## moral 1.553 0.126 12.328 0.000 1.000 1.000
## pathogen 0.703 0.075 9.432 0.000 1.000 1.000
Fit Measures | Value |
---|---|
npar | 45.000 |
fmin | 0.562 |
chisq | 1149.665 |
df | 186.000 |
pvalue | 0.000 |
baseline.chisq | 11718.162 |
baseline.df | 210.000 |
baseline.pvalue | 0.000 |
cfi | 0.916 |
tli | 0.905 |
nnfi | 0.905 |
rfi | 0.889 |
nfi | 0.902 |
pnfi | 0.799 |
ifi | 0.916 |
rni | 0.916 |
logl | -37866.858 |
unrestricted.logl | -37292.026 |
aic | 75823.716 |
bic | 76045.545 |
ntotal | 1022.000 |
bic2 | 75902.620 |
rmsea | 0.071 |
rmsea.ci.lower | 0.067 |
rmsea.ci.upper | 0.075 |
rmsea.pvalue | 0.000 |
rmr | 0.198 |
rmr_nomean | 0.198 |
srmr | 0.058 |
srmr_bentler | 0.058 |
srmr_bentler_nomean | 0.058 |
srmr_bollen | 0.058 |
srmr_bollen_nomean | 0.058 |
srmr_mplus | 0.058 |
srmr_mplus_nomean | 0.058 |
cn_05 | 195.522 |
cn_01 | 208.825 |
gfi | 0.892 |
agfi | 0.866 |
pgfi | 0.719 |
mfi | 0.624 |
ecvi | 1.213 |
lhs | op | rhs | mi | epc | sepc.lv | sepc.all | sepc.nox | |
---|---|---|---|---|---|---|---|---|
111 | s2 | ~~ | s3 | 122.134 | 0.708 | 0.708 | 0.198 | 0.198 |
79 | pathogen | =~ | s3 | 73.640 | -0.599 | -0.502 | -0.255 | -0.255 |
80 | pathogen | =~ | s4 | 46.015 | 0.518 | 0.435 | 0.224 | 0.224 |
224 | m2 | ~~ | m4 | 43.182 | 0.265 | 0.265 | 0.080 | 0.080 |
68 | moral | =~ | s6 | 41.300 | 0.315 | 0.393 | 0.187 | 0.187 |
194 | s6 | ~~ | p6 | 39.791 | 0.477 | 0.477 | 0.131 | 0.131 |
132 | s3 | ~~ | s6 | 37.724 | -0.475 | -0.475 | -0.115 | -0.115 |
165 | s5 | ~~ | s6 | 37.041 | 0.513 | 0.513 | 0.118 | 0.118 |
296 | p4 | ~~ | p6 | 36.304 | 0.383 | 0.383 | 0.131 | 0.131 |
114 | s2 | ~~ | s6 | 34.146 | -0.455 | -0.455 | -0.119 | -0.119 |
124 | s2 | ~~ | p2 | 33.959 | 0.359 | 0.359 | 0.118 | 0.118 |
257 | m5 | ~~ | m7 | 32.919 | 0.218 | 0.218 | 0.062 | 0.062 |
51 | sex | =~ | m3 | 32.439 | 0.180 | 0.240 | 0.128 | 0.128 |
104 | s1 | ~~ | p1 | 32.363 | 0.313 | 0.313 | 0.114 | 0.114 |
77 | pathogen | =~ | s1 | 30.205 | 0.401 | 0.336 | 0.177 | 0.177 |
290 | p2 | ~~ | p7 | 29.304 | 0.320 | 0.320 | 0.116 | 0.116 |
211 | m1 | ~~ | m3 | 28.786 | 0.268 | 0.268 | 0.079 | 0.079 |
64 | moral | =~ | s2 | 28.739 | -0.213 | -0.266 | -0.146 | -0.146 |
293 | p3 | ~~ | p6 | 25.996 | -0.293 | -0.293 | -0.110 | -0.110 |
149 | s4 | ~~ | s6 | 25.586 | 0.424 | 0.424 | 0.104 | 0.104 |
228 | m2 | ~~ | p1 | 23.559 | 0.211 | 0.211 | 0.078 | 0.078 |
150 | s4 | ~~ | s7 | 23.192 | -0.396 | -0.396 | -0.092 | -0.092 |
49 | sex | =~ | m1 | 21.975 | 0.171 | 0.229 | 0.127 | 0.127 |
78 | pathogen | =~ | s2 | 21.372 | -0.328 | -0.275 | -0.152 | -0.152 |
178 | s5 | ~~ | p5 | 21.302 | 0.228 | 0.228 | 0.073 | 0.073 |
287 | p2 | ~~ | p4 | 19.780 | -0.283 | -0.283 | -0.099 | -0.099 |
128 | s2 | ~~ | p6 | 18.106 | -0.259 | -0.259 | -0.083 | -0.083 |
82 | pathogen | =~ | s6 | 17.529 | 0.366 | 0.307 | 0.146 | 0.146 |
59 | sex | =~ | p4 | 16.634 | -0.179 | -0.239 | -0.141 | -0.141 |
191 | s6 | ~~ | p3 | 16.168 | -0.273 | -0.273 | -0.084 | -0.084 |
282 | p1 | ~~ | p4 | 15.875 | 0.220 | 0.220 | 0.089 | 0.089 |
256 | m5 | ~~ | m6 | 15.620 | -0.167 | -0.167 | -0.047 | -0.047 |
113 | s2 | ~~ | s5 | 14.826 | -0.267 | -0.267 | -0.071 | -0.071 |
115 | s2 | ~~ | s7 | 14.234 | 0.289 | 0.289 | 0.072 | 0.072 |
289 | p2 | ~~ | p6 | 14.076 | -0.241 | -0.241 | -0.083 | -0.083 |
230 | m2 | ~~ | p3 | 13.797 | -0.165 | -0.165 | -0.057 | -0.057 |
210 | m1 | ~~ | m2 | 13.617 | 0.180 | 0.180 | 0.054 | 0.054 |
215 | m1 | ~~ | m7 | 13.559 | -0.160 | -0.160 | -0.049 | -0.049 |
237 | m3 | ~~ | m6 | 13.210 | 0.161 | 0.161 | 0.047 | 0.047 |
286 | p2 | ~~ | p3 | 12.351 | 0.202 | 0.202 | 0.078 | 0.078 |
50 | sex | =~ | m2 | 12.146 | -0.108 | -0.144 | -0.077 | -0.077 |
185 | s6 | ~~ | m4 | 12.033 | 0.202 | 0.202 | 0.054 | 0.054 |
212 | m1 | ~~ | m4 | 12.023 | -0.161 | -0.161 | -0.050 | -0.050 |
53 | sex | =~ | m5 | 11.993 | -0.102 | -0.137 | -0.071 | -0.071 |
240 | m3 | ~~ | p2 | 11.983 | 0.177 | 0.177 | 0.056 | 0.056 |
144 | s3 | ~~ | p4 | 11.614 | -0.199 | -0.199 | -0.060 | -0.060 |
145 | s3 | ~~ | p5 | 11.432 | -0.151 | -0.151 | -0.051 | -0.051 |
69 | moral | =~ | s7 | 11.309 | -0.160 | -0.200 | -0.090 | -0.090 |
227 | m2 | ~~ | m7 | 11.169 | -0.128 | -0.128 | -0.038 | -0.038 |
133 | s3 | ~~ | s7 | 10.678 | 0.252 | 0.252 | 0.058 | 0.058 |
223 | m2 | ~~ | m3 | 10.649 | -0.141 | -0.141 | -0.041 | -0.041 |
97 | s1 | ~~ | m1 | 10.412 | 0.194 | 0.194 | 0.057 | 0.057 |
118 | s2 | ~~ | m3 | 10.227 | 0.161 | 0.161 | 0.047 | 0.047 |
280 | p1 | ~~ | p2 | 10.091 | -0.178 | -0.178 | -0.073 | -0.073 |
93 | s1 | ~~ | s4 | 10.020 | 0.223 | 0.223 | 0.061 | 0.061 |
Here we fit a CFA model to each gender group to establish configural invariance.
While CFA approaches to TDDS may not be the most adequate (TDDS uses 7-point ordinal Likert scales while CFA-based techniques expect normal data, even if there are estimators for categorical data), we run measurement invariance analyses as a mean to obtain some information for the below IRT (DIF and DTF) analyses.
We evaluate configural invariance by (a) looking at the produced graphs, their between factor correlations and loadings; as well as by (b) the fit indices of the models for each gender. If there are substantial decrease in goodness of fit across gender, it could indicate non-invariance.
As for (a), it seems that for women there is a smaller association between scores of Moral and Pathogen disgust, and a stronger association between Pathogen and Sexual Disgust, than for men.
However, in inspecting the fit indices , it is likely configural invariance can be established, which is to say the same factor structure could be assumed in each group. So we can consider TDDS to be configurally invariant and move forward with testing Metric Invariance.
## lavaan (0.5-23.1097) converged normally after 63 iterations
##
## Number of observations per group
## male 553
## female 469
##
## Estimator DWLS Robust
## Minimum Function Test Statistic 1621.068 1242.260
## Degrees of freedom 624 624
## P-value (Chi-square) 0.000 0.000
## Scaling correction factor 1.892
## Shift parameter for each group:
## male 208.624
## female 176.934
## for simple second-order correction (Mplus variant)
##
## Chi-square for each group:
##
## male 844.118 654.723
## female 776.950 587.537
##
## Parameter Estimates:
##
## Information Expected
## Standard Errors Robust.sem
##
##
## Group 1 [male]:
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## sex =~
## s1 1.000 0.700 0.700
## s2 1.107 0.054 20.449 0.000 0.774 0.774
## s3 1.230 0.049 24.950 0.000 0.860 0.860
## s4 1.012 0.049 20.494 0.000 0.708 0.708
## s5 1.086 0.049 22.276 0.000 0.760 0.760
## s6 1.023 0.052 19.711 0.000 0.716 0.716
## s7 0.990 0.047 21.225 0.000 0.693 0.693
## moral =~
## m1 1.000 0.729 0.729
## m2 1.122 0.029 38.786 0.000 0.818 0.818
## m3 1.190 0.029 40.495 0.000 0.867 0.867
## m4 1.103 0.033 33.906 0.000 0.804 0.804
## m5 1.172 0.029 39.905 0.000 0.855 0.855
## m6 1.140 0.030 37.863 0.000 0.831 0.831
## m7 1.211 0.032 37.597 0.000 0.883 0.883
## pathogen =~
## p1 1.000 0.565 0.565
## p2 1.041 0.086 12.077 0.000 0.588 0.588
## p3 1.120 0.087 12.812 0.000 0.633 0.633
## p4 1.219 0.084 14.465 0.000 0.688 0.688
## p5 1.384 0.095 14.542 0.000 0.782 0.782
## p6 1.256 0.088 14.276 0.000 0.709 0.709
## p7 1.189 0.089 13.312 0.000 0.671 0.671
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## sex ~~
## moral 0.204 0.021 9.551 0.000 0.399 0.399
## pathogen 0.166 0.020 8.365 0.000 0.420 0.420
## moral ~~
## pathogen 0.207 0.021 9.671 0.000 0.504 0.504
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .s1 0.000 0.000 0.000
## .s2 0.000 0.000 0.000
## .s3 0.000 0.000 0.000
## .s4 0.000 0.000 0.000
## .s5 0.000 0.000 0.000
## .s6 0.000 0.000 0.000
## .s7 0.000 0.000 0.000
## .m1 0.000 0.000 0.000
## .m2 0.000 0.000 0.000
## .m3 0.000 0.000 0.000
## .m4 0.000 0.000 0.000
## .m5 0.000 0.000 0.000
## .m6 0.000 0.000 0.000
## .m7 0.000 0.000 0.000
## .p1 0.000 0.000 0.000
## .p2 0.000 0.000 0.000
## .p3 0.000 0.000 0.000
## .p4 0.000 0.000 0.000
## .p5 0.000 0.000 0.000
## .p6 0.000 0.000 0.000
## .p7 0.000 0.000 0.000
## sex 0.000 0.000 0.000
## moral 0.000 0.000 0.000
## pathogen 0.000 0.000 0.000
##
## Thresholds:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## s1|t1 -0.954 -0.954 -0.954
## s1|t2 -0.292 -0.292 -0.292
## s1|t3 0.111 0.111 0.111
## s1|t4 0.620 0.620 0.620
## s1|t5 1.143 1.143 1.143
## s1|t6 1.733 1.733 1.733
## s2|t1 0.052 0.052 0.052
## s2|t2 0.493 0.493 0.493
## s2|t3 0.859 0.859 0.859
## s2|t4 1.306 1.306 1.306
## s2|t5 1.622 1.622 1.622
## s2|t6 2.020 2.020 2.020
## s3|t1 -0.075 -0.075 -0.075
## s3|t2 0.452 0.452 0.452
## s3|t3 0.833 0.833 0.833
## s3|t4 1.215 1.215 1.215
## s3|t5 1.485 1.485 1.485
## s3|t6 1.753 1.753 1.753
## s4|t1 -0.833 -0.833 -0.833
## s4|t2 -0.268 -0.268 -0.268
## s4|t3 0.180 0.180 0.180
## s4|t4 0.642 0.642 0.642
## s4|t5 1.117 1.117 1.117
## s4|t6 1.733 1.733 1.733
## s5|t1 -0.437 -0.437 -0.437
## s5|t2 0.130 0.130 0.130
## s5|t3 0.524 0.524 0.524
## s5|t4 0.892 0.892 0.892
## s5|t5 1.264 1.264 1.264
## s5|t6 1.674 1.674 1.674
## s6|t1 -0.807 -0.807 -0.807
## s6|t2 -0.369 -0.369 -0.369
## s6|t3 0.011 0.011 0.011
## s6|t4 0.452 0.452 0.452
## s6|t5 0.846 0.846 0.846
## s6|t6 1.254 1.254 1.254
## s7|t1 -0.364 -0.364 -0.364
## s7|t2 -0.007 -0.007 -0.007
## s7|t3 0.273 0.273 0.273
## s7|t4 0.693 0.693 0.693
## s7|t5 1.109 1.109 1.109
## s7|t6 1.499 1.499 1.499
## m1|t1 -0.912 -0.912 -0.912
## m1|t2 -0.335 -0.335 -0.335
## m1|t3 0.130 0.130 0.130
## m1|t4 0.545 0.545 0.545
## m1|t5 1.084 1.084 1.084
## m1|t6 1.605 1.605 1.605
## m2|t1 -1.384 -1.384 -1.384
## m2|t2 -1.084 -1.084 -1.084
## m2|t3 -0.814 -0.814 -0.814
## m2|t4 -0.393 -0.393 -0.393
## m2|t5 0.093 0.093 0.093
## m2|t6 0.899 0.899 0.899
## m3|t1 -1.068 -1.068 -1.068
## m3|t2 -0.615 -0.615 -0.615
## m3|t3 -0.198 -0.198 -0.198
## m3|t4 0.301 0.301 0.301
## m3|t5 0.839 0.839 0.839
## m3|t6 1.472 1.472 1.472
## m4|t1 -1.446 -1.446 -1.446
## m4|t2 -1.117 -1.117 -1.117
## m4|t3 -0.625 -0.625 -0.625
## m4|t4 -0.185 -0.185 -0.185
## m4|t5 0.354 0.354 0.354
## m4|t6 1.117 1.117 1.117
## m5|t1 -1.160 -1.160 -1.160
## m5|t2 -0.801 -0.801 -0.801
## m5|t3 -0.566 -0.566 -0.566
## m5|t4 -0.079 -0.079 -0.079
## m5|t5 0.519 0.519 0.519
## m5|t6 1.028 1.028 1.028
## m6|t1 -1.215 -1.215 -1.215
## m6|t2 -0.820 -0.820 -0.820
## m6|t3 -0.427 -0.427 -0.427
## m6|t4 0.093 0.093 0.093
## m6|t5 0.699 0.699 0.699
## m6|t6 1.316 1.316 1.316
## m7|t1 -1.254 -1.254 -1.254
## m7|t2 -0.820 -0.820 -0.820
## m7|t3 -0.473 -0.473 -0.473
## m7|t4 -0.016 -0.016 -0.016
## m7|t5 0.665 0.665 0.665
## m7|t6 1.338 1.338 1.338
## p1|t1 -2.296 -2.296 -2.296
## p1|t2 -1.656 -1.656 -1.656
## p1|t3 -1.125 -1.125 -1.125
## p1|t4 -0.620 -0.620 -0.620
## p1|t5 -0.025 -0.025 -0.025
## p1|t6 0.710 0.710 0.710
## p2|t1 -1.472 -1.472 -1.472
## p2|t2 -0.865 -0.865 -0.865
## p2|t3 -0.320 -0.320 -0.320
## p2|t4 0.171 0.171 0.171
## p2|t5 0.905 0.905 0.905
## p2|t6 1.528 1.528 1.528
## p3|t1 -1.753 -1.753 -1.753
## p3|t2 -0.846 -0.846 -0.846
## p3|t3 -0.311 -0.311 -0.311
## p3|t4 0.282 0.282 0.282
## p3|t5 1.044 1.044 1.044
## p3|t6 1.797 1.797 1.797
## p4|t1 -1.820 -1.820 -1.820
## p4|t2 -1.134 -1.134 -1.134
## p4|t3 -0.598 -0.598 -0.598
## p4|t4 -0.148 -0.148 -0.148
## p4|t5 0.498 0.498 0.498
## p4|t6 1.100 1.100 1.100
## p5|t1 -2.056 -2.056 -2.056
## p5|t2 -1.264 -1.264 -1.264
## p5|t3 -0.693 -0.693 -0.693
## p5|t4 -0.185 -0.185 -0.185
## p5|t5 0.508 0.508 0.508
## p5|t6 1.361 1.361 1.361
## p6|t1 -1.656 -1.656 -1.656
## p6|t2 -0.919 -0.919 -0.919
## p6|t3 -0.508 -0.508 -0.508
## p6|t4 -0.029 -0.029 -0.029
## p6|t5 0.631 0.631 0.631
## p6|t6 1.274 1.274 1.274
## p7|t1 -1.775 -1.775 -1.775
## p7|t2 -1.264 -1.264 -1.264
## p7|t3 -0.865 -0.865 -0.865
## p7|t4 -0.354 -0.354 -0.354
## p7|t5 0.335 0.335 0.335
## p7|t6 0.983 0.983 0.983
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .s1 0.511 0.511 0.511
## .s2 0.401 0.401 0.401
## .s3 0.260 0.260 0.260
## .s4 0.498 0.498 0.498
## .s5 0.423 0.423 0.423
## .s6 0.488 0.488 0.488
## .s7 0.520 0.520 0.520
## .m1 0.468 0.468 0.468
## .m2 0.330 0.330 0.330
## .m3 0.248 0.248 0.248
## .m4 0.353 0.353 0.353
## .m5 0.269 0.269 0.269
## .m6 0.309 0.309 0.309
## .m7 0.221 0.221 0.221
## .p1 0.681 0.681 0.681
## .p2 0.655 0.655 0.655
## .p3 0.600 0.600 0.600
## .p4 0.527 0.527 0.527
## .p5 0.389 0.389 0.389
## .p6 0.497 0.497 0.497
## .p7 0.549 0.549 0.549
## sex 0.489 0.038 12.992 0.000 1.000 1.000
## moral 0.532 0.029 18.214 0.000 1.000 1.000
## pathogen 0.319 0.041 7.812 0.000 1.000 1.000
##
## Scales y*:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## s1 1.000 1.000 1.000
## s2 1.000 1.000 1.000
## s3 1.000 1.000 1.000
## s4 1.000 1.000 1.000
## s5 1.000 1.000 1.000
## s6 1.000 1.000 1.000
## s7 1.000 1.000 1.000
## m1 1.000 1.000 1.000
## m2 1.000 1.000 1.000
## m3 1.000 1.000 1.000
## m4 1.000 1.000 1.000
## m5 1.000 1.000 1.000
## m6 1.000 1.000 1.000
## m7 1.000 1.000 1.000
## p1 1.000 1.000 1.000
## p2 1.000 1.000 1.000
## p3 1.000 1.000 1.000
## p4 1.000 1.000 1.000
## p5 1.000 1.000 1.000
## p6 1.000 1.000 1.000
## p7 1.000 1.000 1.000
##
##
## Group 2 [female]:
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## sex =~
## s1 1.000 0.757 0.757
## s2 0.852 0.048 17.863 0.000 0.645 0.645
## s3 1.010 0.044 22.880 0.000 0.764 0.764
## s4 0.943 0.045 21.054 0.000 0.714 0.714
## s5 0.968 0.048 20.323 0.000 0.733 0.733
## s6 0.913 0.062 14.672 0.000 0.691 0.691
## s7 0.917 0.049 18.651 0.000 0.694 0.694
## moral =~
## m1 1.000 0.776 0.776
## m2 1.129 0.027 42.187 0.000 0.876 0.876
## m3 1.059 0.025 42.398 0.000 0.822 0.822
## m4 1.115 0.027 41.760 0.000 0.865 0.865
## m5 1.154 0.026 44.076 0.000 0.896 0.896
## m6 1.056 0.026 40.312 0.000 0.819 0.819
## m7 1.151 0.026 43.467 0.000 0.893 0.893
## pathogen =~
## p1 1.000 0.673 0.673
## p2 0.952 0.060 15.924 0.000 0.641 0.641
## p3 1.068 0.054 19.840 0.000 0.719 0.719
## p4 0.888 0.050 17.831 0.000 0.598 0.598
## p5 1.248 0.061 20.583 0.000 0.839 0.839
## p6 0.943 0.060 15.598 0.000 0.634 0.634
## p7 1.131 0.060 18.712 0.000 0.761 0.761
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## sex ~~
## moral 0.227 0.027 8.340 0.000 0.387 0.387
## pathogen 0.329 0.026 12.546 0.000 0.646 0.646
## moral ~~
## pathogen 0.169 0.026 6.410 0.000 0.323 0.323
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .s1 0.000 0.000 0.000
## .s2 0.000 0.000 0.000
## .s3 0.000 0.000 0.000
## .s4 0.000 0.000 0.000
## .s5 0.000 0.000 0.000
## .s6 0.000 0.000 0.000
## .s7 0.000 0.000 0.000
## .m1 0.000 0.000 0.000
## .m2 0.000 0.000 0.000
## .m3 0.000 0.000 0.000
## .m4 0.000 0.000 0.000
## .m5 0.000 0.000 0.000
## .m6 0.000 0.000 0.000
## .m7 0.000 0.000 0.000
## .p1 0.000 0.000 0.000
## .p2 0.000 0.000 0.000
## .p3 0.000 0.000 0.000
## .p4 0.000 0.000 0.000
## .p5 0.000 0.000 0.000
## .p6 0.000 0.000 0.000
## .p7 0.000 0.000 0.000
## sex 0.000 0.000 0.000
## moral 0.000 0.000 0.000
## pathogen 0.000 0.000 0.000
##
## Thresholds:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## s1|t1 -1.200 -1.200 -1.200
## s1|t2 -0.795 -0.795 -0.795
## s1|t3 -0.404 -0.404 -0.404
## s1|t4 -0.024 -0.024 -0.024
## s1|t5 0.450 0.450 0.450
## s1|t6 0.944 0.944 0.944
## s2|t1 -0.273 -0.273 -0.273
## s2|t2 0.175 0.175 0.175
## s2|t3 0.392 0.392 0.392
## s2|t4 0.738 0.738 0.738
## s2|t5 1.049 1.049 1.049
## s2|t6 1.398 1.398 1.398
## s3|t1 -0.529 -0.529 -0.529
## s3|t2 -0.088 -0.088 -0.088
## s3|t3 0.207 0.207 0.207
## s3|t4 0.535 0.535 0.535
## s3|t5 0.766 0.766 0.766
## s3|t6 1.077 1.077 1.077
## s4|t1 -1.398 -1.398 -1.398
## s4|t2 -0.952 -0.952 -0.952
## s4|t3 -0.535 -0.535 -0.535
## s4|t4 -0.169 -0.169 -0.169
## s4|t5 0.279 0.279 0.279
## s4|t6 0.833 0.833 0.833
## s5|t1 -1.040 -1.040 -1.040
## s5|t2 -0.676 -0.676 -0.676
## s5|t3 -0.375 -0.375 -0.375
## s5|t4 0.024 0.024 0.024
## s5|t5 0.409 0.409 0.409
## s5|t6 0.848 0.848 0.848
## s6|t1 -2.071 -2.071 -2.071
## s6|t2 -1.770 -1.770 -1.770
## s6|t3 -1.357 -1.357 -1.357
## s6|t4 -0.952 -0.952 -0.952
## s6|t5 -0.553 -0.553 -0.553
## s6|t6 0.056 0.056 0.056
## s7|t1 -0.887 -0.887 -0.887
## s7|t2 -0.504 -0.504 -0.504
## s7|t3 -0.175 -0.175 -0.175
## s7|t4 0.062 0.062 0.062
## s7|t5 0.301 0.301 0.301
## s7|t6 0.547 0.547 0.547
## m1|t1 -0.952 -0.952 -0.952
## m1|t2 -0.516 -0.516 -0.516
## m1|t3 -0.035 -0.035 -0.035
## m1|t4 0.523 0.523 0.523
## m1|t5 0.960 0.960 0.960
## m1|t6 1.330 1.330 1.330
## m2|t1 -1.384 -1.384 -1.384
## m2|t2 -1.096 -1.096 -1.096
## m2|t3 -0.840 -0.840 -0.840
## m2|t4 -0.541 -0.541 -0.541
## m2|t5 -0.148 -0.148 -0.148
## m2|t6 0.456 0.456 0.456
## m3|t1 -1.157 -1.157 -1.157
## m3|t2 -0.848 -0.848 -0.848
## m3|t3 -0.444 -0.444 -0.444
## m3|t4 -0.003 -0.003 -0.003
## m3|t5 0.474 0.474 0.474
## m3|t6 1.021 1.021 1.021
## m4|t1 -1.413 -1.413 -1.413
## m4|t2 -1.167 -1.167 -1.167
## m4|t3 -0.855 -0.855 -0.855
## m4|t4 -0.427 -0.427 -0.427
## m4|t5 0.088 0.088 0.088
## m4|t6 0.759 0.759 0.759
## m5|t1 -1.256 -1.256 -1.256
## m5|t2 -0.927 -0.927 -0.927
## m5|t3 -0.604 -0.604 -0.604
## m5|t4 -0.246 -0.246 -0.246
## m5|t5 0.191 0.191 0.191
## m5|t6 0.752 0.752 0.752
## m6|t1 -1.233 -1.233 -1.233
## m6|t2 -0.810 -0.810 -0.810
## m6|t3 -0.486 -0.486 -0.486
## m6|t4 -0.040 -0.040 -0.040
## m6|t5 0.462 0.462 0.462
## m6|t6 1.012 1.012 1.012
## m7|t1 -1.293 -1.293 -1.293
## m7|t2 -0.952 -0.952 -0.952
## m7|t3 -0.636 -0.636 -0.636
## m7|t4 -0.158 -0.158 -0.158
## m7|t5 0.375 0.375 0.375
## m7|t6 0.960 0.960 0.960
## p1|t1 -2.385 -2.385 -2.385
## p1|t2 -1.823 -1.823 -1.823
## p1|t3 -1.330 -1.330 -1.330
## p1|t4 -0.879 -0.879 -0.879
## p1|t5 -0.307 -0.307 -0.307
## p1|t6 0.290 0.290 0.290
## p2|t1 -1.489 -1.489 -1.489
## p2|t2 -0.855 -0.855 -0.855
## p2|t3 -0.421 -0.421 -0.421
## p2|t4 0.051 0.051 0.051
## p2|t5 0.579 0.579 0.579
## p2|t6 1.222 1.222 1.222
## p3|t1 -1.614 -1.614 -1.614
## p3|t2 -0.969 -0.969 -0.969
## p3|t3 -0.421 -0.421 -0.421
## p3|t4 0.067 0.067 0.067
## p3|t5 0.803 0.803 0.803
## p3|t6 1.357 1.357 1.357
## p4|t1 -1.654 -1.654 -1.654
## p4|t2 -1.096 -1.096 -1.096
## p4|t3 -0.731 -0.731 -0.731
## p4|t4 -0.268 -0.268 -0.268
## p4|t5 0.268 0.268 0.268
## p4|t6 0.817 0.817 0.817
## p5|t1 -2.071 -2.071 -2.071
## p5|t2 -1.489 -1.489 -1.489
## p5|t3 -0.995 -0.995 -0.995
## p5|t4 -0.468 -0.468 -0.468
## p5|t5 0.137 0.137 0.137
## p5|t6 0.895 0.895 0.895
## p6|t1 -1.852 -1.852 -1.852
## p6|t2 -1.370 -1.370 -1.370
## p6|t3 -0.969 -0.969 -0.969
## p6|t4 -0.604 -0.604 -0.604
## p6|t5 -0.008 -0.008 -0.008
## p6|t6 0.498 0.498 0.498
## p7|t1 -1.852 -1.852 -1.852
## p7|t2 -1.318 -1.318 -1.318
## p7|t3 -1.004 -1.004 -1.004
## p7|t4 -0.468 -0.468 -0.468
## p7|t5 0.040 0.040 0.040
## p7|t6 0.610 0.610 0.610
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .s1 0.427 0.427 0.427
## .s2 0.585 0.585 0.585
## .s3 0.416 0.416 0.416
## .s4 0.491 0.491 0.491
## .s5 0.463 0.463 0.463
## .s6 0.522 0.522 0.522
## .s7 0.518 0.518 0.518
## .m1 0.398 0.398 0.398
## .m2 0.233 0.233 0.233
## .m3 0.324 0.324 0.324
## .m4 0.252 0.252 0.252
## .m5 0.198 0.198 0.198
## .m6 0.329 0.329 0.329
## .m7 0.202 0.202 0.202
## .p1 0.547 0.547 0.547
## .p2 0.590 0.590 0.590
## .p3 0.483 0.483 0.483
## .p4 0.643 0.643 0.643
## .p5 0.295 0.295 0.295
## .p6 0.598 0.598 0.598
## .p7 0.420 0.420 0.420
## sex 0.573 0.042 13.800 0.000 1.000 1.000
## moral 0.602 0.029 20.452 0.000 1.000 1.000
## pathogen 0.453 0.042 10.761 0.000 1.000 1.000
##
## Scales y*:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## s1 1.000 1.000 1.000
## s2 1.000 1.000 1.000
## s3 1.000 1.000 1.000
## s4 1.000 1.000 1.000
## s5 1.000 1.000 1.000
## s6 1.000 1.000 1.000
## s7 1.000 1.000 1.000
## m1 1.000 1.000 1.000
## m2 1.000 1.000 1.000
## m3 1.000 1.000 1.000
## m4 1.000 1.000 1.000
## m5 1.000 1.000 1.000
## m6 1.000 1.000 1.000
## m7 1.000 1.000 1.000
## p1 1.000 1.000 1.000
## p2 1.000 1.000 1.000
## p3 1.000 1.000 1.000
## p4 1.000 1.000 1.000
## p5 1.000 1.000 1.000
## p6 1.000 1.000 1.000
## p7 1.000 1.000 1.000
Fit Indices | Gender | Value |
---|---|---|
CFI | male | 0.918 |
CFI | female | 0.902 |
TLI | male | 0.907 |
TLI | female | 0.889 |
RMSEA | male | 0.068 |
RMSEA | female | 0.077 |
SRMR | male | 0.066 |
SRMR | female | 0.071 |
chisq | df | pvalue | cfi | rmsea | tli | gfi | |
---|---|---|---|---|---|---|---|
x | 1621.068 | 624 | 0 | 0.988 | 0.056 | 0.992 | 0.986 |
Note the factor loadings are not the same in the graphs because these are the standardized. At the summary below, it is shown they are constrained to be equal.
## lavaan (0.5-23.1097) converged normally after 56 iterations
##
## Number of observations per group
## male 553
## female 469
##
## Estimator DWLS Robust
## Minimum Function Test Statistic 1778.762 1314.720
## Degrees of freedom 642 642
## P-value (Chi-square) 0.000 0.000
## Scaling correction factor 1.921
## Shift parameter for each group:
## male 210.394
## female 178.436
## for simple second-order correction (Mplus variant)
##
## Chi-square for each group:
##
## male 917.541 687.997
## female 861.222 626.723
##
## Parameter Estimates:
##
## Information Expected
## Standard Errors Robust.sem
##
##
## Group 1 [male]:
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## sex =~
## s1 1.000 0.741 0.741
## s2 (.p2.) 0.991 0.036 27.846 0.000 0.735 0.735
## s3 (.p3.) 1.129 0.033 34.728 0.000 0.837 0.837
## s4 (.p4.) 0.976 0.033 29.428 0.000 0.723 0.723
## s5 (.p5.) 1.029 0.034 30.507 0.000 0.763 0.763
## s6 (.p6.) 0.967 0.039 24.531 0.000 0.717 0.717
## s7 (.p7.) 0.954 0.034 28.486 0.000 0.708 0.708
## moral =~
## m1 1.000 0.742 0.742
## m2 (.p9.) 1.128 0.020 57.185 0.000 0.837 0.837
## m3 (.10.) 1.127 0.019 59.243 0.000 0.836 0.836
## m4 (.11.) 1.111 0.021 52.921 0.000 0.824 0.824
## m5 (.12.) 1.165 0.020 59.252 0.000 0.864 0.864
## m6 (.13.) 1.099 0.020 55.373 0.000 0.816 0.816
## m7 (.14.) 1.182 0.021 56.904 0.000 0.877 0.877
## pathogen =~
## p1 1.000 0.602 0.602
## p2 (.16.) 0.994 0.051 19.561 0.000 0.599 0.599
## p3 (.17.) 1.096 0.049 22.265 0.000 0.660 0.660
## p4 (.18.) 1.046 0.045 23.080 0.000 0.630 0.630
## p5 (.19.) 1.315 0.054 24.243 0.000 0.792 0.792
## p6 (.20.) 1.096 0.051 21.593 0.000 0.660 0.660
## p7 (.21.) 1.158 0.052 22.209 0.000 0.698 0.698
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## sex ~~
## moral 0.219 0.021 10.309 0.000 0.398 0.398
## pathogen 0.188 0.019 9.785 0.000 0.421 0.421
## moral ~~
## pathogen 0.225 0.020 11.395 0.000 0.503 0.503
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .s1 0.000 0.000 0.000
## .s2 0.000 0.000 0.000
## .s3 0.000 0.000 0.000
## .s4 0.000 0.000 0.000
## .s5 0.000 0.000 0.000
## .s6 0.000 0.000 0.000
## .s7 0.000 0.000 0.000
## .m1 0.000 0.000 0.000
## .m2 0.000 0.000 0.000
## .m3 0.000 0.000 0.000
## .m4 0.000 0.000 0.000
## .m5 0.000 0.000 0.000
## .m6 0.000 0.000 0.000
## .m7 0.000 0.000 0.000
## .p1 0.000 0.000 0.000
## .p2 0.000 0.000 0.000
## .p3 0.000 0.000 0.000
## .p4 0.000 0.000 0.000
## .p5 0.000 0.000 0.000
## .p6 0.000 0.000 0.000
## .p7 0.000 0.000 0.000
## sex 0.000 0.000 0.000
## moral 0.000 0.000 0.000
## pathogen 0.000 0.000 0.000
##
## Thresholds:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## s1|t1 -0.954 -0.954 -0.954
## s1|t2 -0.292 -0.292 -0.292
## s1|t3 0.111 0.111 0.111
## s1|t4 0.620 0.620 0.620
## s1|t5 1.143 1.143 1.143
## s1|t6 1.733 1.733 1.733
## s2|t1 0.052 0.052 0.052
## s2|t2 0.493 0.493 0.493
## s2|t3 0.859 0.859 0.859
## s2|t4 1.306 1.306 1.306
## s2|t5 1.622 1.622 1.622
## s2|t6 2.020 2.020 2.020
## s3|t1 -0.075 -0.075 -0.075
## s3|t2 0.452 0.452 0.452
## s3|t3 0.833 0.833 0.833
## s3|t4 1.215 1.215 1.215
## s3|t5 1.485 1.485 1.485
## s3|t6 1.753 1.753 1.753
## s4|t1 -0.833 -0.833 -0.833
## s4|t2 -0.268 -0.268 -0.268
## s4|t3 0.180 0.180 0.180
## s4|t4 0.642 0.642 0.642
## s4|t5 1.117 1.117 1.117
## s4|t6 1.733 1.733 1.733
## s5|t1 -0.437 -0.437 -0.437
## s5|t2 0.130 0.130 0.130
## s5|t3 0.524 0.524 0.524
## s5|t4 0.892 0.892 0.892
## s5|t5 1.264 1.264 1.264
## s5|t6 1.674 1.674 1.674
## s6|t1 -0.807 -0.807 -0.807
## s6|t2 -0.369 -0.369 -0.369
## s6|t3 0.011 0.011 0.011
## s6|t4 0.452 0.452 0.452
## s6|t5 0.846 0.846 0.846
## s6|t6 1.254 1.254 1.254
## s7|t1 -0.364 -0.364 -0.364
## s7|t2 -0.007 -0.007 -0.007
## s7|t3 0.273 0.273 0.273
## s7|t4 0.693 0.693 0.693
## s7|t5 1.109 1.109 1.109
## s7|t6 1.499 1.499 1.499
## m1|t1 -0.912 -0.912 -0.912
## m1|t2 -0.335 -0.335 -0.335
## m1|t3 0.130 0.130 0.130
## m1|t4 0.545 0.545 0.545
## m1|t5 1.084 1.084 1.084
## m1|t6 1.605 1.605 1.605
## m2|t1 -1.384 -1.384 -1.384
## m2|t2 -1.084 -1.084 -1.084
## m2|t3 -0.814 -0.814 -0.814
## m2|t4 -0.393 -0.393 -0.393
## m2|t5 0.093 0.093 0.093
## m2|t6 0.899 0.899 0.899
## m3|t1 -1.068 -1.068 -1.068
## m3|t2 -0.615 -0.615 -0.615
## m3|t3 -0.198 -0.198 -0.198
## m3|t4 0.301 0.301 0.301
## m3|t5 0.839 0.839 0.839
## m3|t6 1.472 1.472 1.472
## m4|t1 -1.446 -1.446 -1.446
## m4|t2 -1.117 -1.117 -1.117
## m4|t3 -0.625 -0.625 -0.625
## m4|t4 -0.185 -0.185 -0.185
## m4|t5 0.354 0.354 0.354
## m4|t6 1.117 1.117 1.117
## m5|t1 -1.160 -1.160 -1.160
## m5|t2 -0.801 -0.801 -0.801
## m5|t3 -0.566 -0.566 -0.566
## m5|t4 -0.079 -0.079 -0.079
## m5|t5 0.519 0.519 0.519
## m5|t6 1.028 1.028 1.028
## m6|t1 -1.215 -1.215 -1.215
## m6|t2 -0.820 -0.820 -0.820
## m6|t3 -0.427 -0.427 -0.427
## m6|t4 0.093 0.093 0.093
## m6|t5 0.699 0.699 0.699
## m6|t6 1.316 1.316 1.316
## m7|t1 -1.254 -1.254 -1.254
## m7|t2 -0.820 -0.820 -0.820
## m7|t3 -0.473 -0.473 -0.473
## m7|t4 -0.016 -0.016 -0.016
## m7|t5 0.665 0.665 0.665
## m7|t6 1.338 1.338 1.338
## p1|t1 -2.296 -2.296 -2.296
## p1|t2 -1.656 -1.656 -1.656
## p1|t3 -1.125 -1.125 -1.125
## p1|t4 -0.620 -0.620 -0.620
## p1|t5 -0.025 -0.025 -0.025
## p1|t6 0.710 0.710 0.710
## p2|t1 -1.472 -1.472 -1.472
## p2|t2 -0.865 -0.865 -0.865
## p2|t3 -0.320 -0.320 -0.320
## p2|t4 0.171 0.171 0.171
## p2|t5 0.905 0.905 0.905
## p2|t6 1.528 1.528 1.528
## p3|t1 -1.753 -1.753 -1.753
## p3|t2 -0.846 -0.846 -0.846
## p3|t3 -0.311 -0.311 -0.311
## p3|t4 0.282 0.282 0.282
## p3|t5 1.044 1.044 1.044
## p3|t6 1.797 1.797 1.797
## p4|t1 -1.820 -1.820 -1.820
## p4|t2 -1.134 -1.134 -1.134
## p4|t3 -0.598 -0.598 -0.598
## p4|t4 -0.148 -0.148 -0.148
## p4|t5 0.498 0.498 0.498
## p4|t6 1.100 1.100 1.100
## p5|t1 -2.056 -2.056 -2.056
## p5|t2 -1.264 -1.264 -1.264
## p5|t3 -0.693 -0.693 -0.693
## p5|t4 -0.185 -0.185 -0.185
## p5|t5 0.508 0.508 0.508
## p5|t6 1.361 1.361 1.361
## p6|t1 -1.656 -1.656 -1.656
## p6|t2 -0.919 -0.919 -0.919
## p6|t3 -0.508 -0.508 -0.508
## p6|t4 -0.029 -0.029 -0.029
## p6|t5 0.631 0.631 0.631
## p6|t6 1.274 1.274 1.274
## p7|t1 -1.775 -1.775 -1.775
## p7|t2 -1.264 -1.264 -1.264
## p7|t3 -0.865 -0.865 -0.865
## p7|t4 -0.354 -0.354 -0.354
## p7|t5 0.335 0.335 0.335
## p7|t6 0.983 0.983 0.983
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .s1 0.450 0.450 0.450
## .s2 0.460 0.460 0.460
## .s3 0.299 0.299 0.299
## .s4 0.477 0.477 0.477
## .s5 0.418 0.418 0.418
## .s6 0.486 0.486 0.486
## .s7 0.499 0.499 0.499
## .m1 0.449 0.449 0.449
## .m2 0.300 0.300 0.300
## .m3 0.301 0.301 0.301
## .m4 0.321 0.321 0.321
## .m5 0.253 0.253 0.253
## .m6 0.335 0.335 0.335
## .m7 0.231 0.231 0.231
## .p1 0.637 0.637 0.637
## .p2 0.641 0.641 0.641
## .p3 0.564 0.564 0.564
## .p4 0.603 0.603 0.603
## .p5 0.372 0.372 0.372
## .p6 0.564 0.564 0.564
## .p7 0.513 0.513 0.513
## sex 0.550 0.032 17.379 0.000 1.000 1.000
## moral 0.551 0.022 24.837 0.000 1.000 1.000
## pathogen 0.363 0.030 12.252 0.000 1.000 1.000
##
## Scales y*:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## s1 1.000 1.000 1.000
## s2 1.000 1.000 1.000
## s3 1.000 1.000 1.000
## s4 1.000 1.000 1.000
## s5 1.000 1.000 1.000
## s6 1.000 1.000 1.000
## s7 1.000 1.000 1.000
## m1 1.000 1.000 1.000
## m2 1.000 1.000 1.000
## m3 1.000 1.000 1.000
## m4 1.000 1.000 1.000
## m5 1.000 1.000 1.000
## m6 1.000 1.000 1.000
## m7 1.000 1.000 1.000
## p1 1.000 1.000 1.000
## p2 1.000 1.000 1.000
## p3 1.000 1.000 1.000
## p4 1.000 1.000 1.000
## p5 1.000 1.000 1.000
## p6 1.000 1.000 1.000
## p7 1.000 1.000 1.000
##
##
## Group 2 [female]:
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## sex =~
## s1 1.000 0.710 0.710
## s2 (.p2.) 0.991 0.036 27.846 0.000 0.703 0.703
## s3 (.p3.) 1.129 0.033 34.728 0.000 0.801 0.801
## s4 (.p4.) 0.976 0.033 29.428 0.000 0.692 0.692
## s5 (.p5.) 1.029 0.034 30.507 0.000 0.730 0.730
## s6 (.p6.) 0.967 0.039 24.531 0.000 0.686 0.686
## s7 (.p7.) 0.954 0.034 28.486 0.000 0.677 0.677
## moral =~
## m1 1.000 0.761 0.761
## m2 (.p9.) 1.128 0.020 57.185 0.000 0.858 0.858
## m3 (.10.) 1.127 0.019 59.243 0.000 0.858 0.858
## m4 (.11.) 1.111 0.021 52.921 0.000 0.846 0.846
## m5 (.12.) 1.165 0.020 59.252 0.000 0.887 0.887
## m6 (.13.) 1.099 0.020 55.373 0.000 0.837 0.837
## m7 (.14.) 1.182 0.021 56.904 0.000 0.900 0.900
## pathogen =~
## p1 1.000 0.634 0.634
## p2 (.16.) 0.994 0.051 19.561 0.000 0.630 0.630
## p3 (.17.) 1.096 0.049 22.265 0.000 0.695 0.695
## p4 (.18.) 1.046 0.045 23.080 0.000 0.663 0.663
## p5 (.19.) 1.315 0.054 24.243 0.000 0.834 0.834
## p6 (.20.) 1.096 0.051 21.593 0.000 0.694 0.694
## p7 (.21.) 1.158 0.052 22.209 0.000 0.734 0.734
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## sex ~~
## moral 0.208 0.025 8.342 0.000 0.385 0.385
## pathogen 0.289 0.023 12.419 0.000 0.642 0.642
## moral ~~
## pathogen 0.155 0.024 6.463 0.000 0.321 0.321
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .s1 0.000 0.000 0.000
## .s2 0.000 0.000 0.000
## .s3 0.000 0.000 0.000
## .s4 0.000 0.000 0.000
## .s5 0.000 0.000 0.000
## .s6 0.000 0.000 0.000
## .s7 0.000 0.000 0.000
## .m1 0.000 0.000 0.000
## .m2 0.000 0.000 0.000
## .m3 0.000 0.000 0.000
## .m4 0.000 0.000 0.000
## .m5 0.000 0.000 0.000
## .m6 0.000 0.000 0.000
## .m7 0.000 0.000 0.000
## .p1 0.000 0.000 0.000
## .p2 0.000 0.000 0.000
## .p3 0.000 0.000 0.000
## .p4 0.000 0.000 0.000
## .p5 0.000 0.000 0.000
## .p6 0.000 0.000 0.000
## .p7 0.000 0.000 0.000
## sex 0.000 0.000 0.000
## moral 0.000 0.000 0.000
## pathogen 0.000 0.000 0.000
##
## Thresholds:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## s1|t1 -1.200 -1.200 -1.200
## s1|t2 -0.795 -0.795 -0.795
## s1|t3 -0.404 -0.404 -0.404
## s1|t4 -0.024 -0.024 -0.024
## s1|t5 0.450 0.450 0.450
## s1|t6 0.944 0.944 0.944
## s2|t1 -0.273 -0.273 -0.273
## s2|t2 0.175 0.175 0.175
## s2|t3 0.392 0.392 0.392
## s2|t4 0.738 0.738 0.738
## s2|t5 1.049 1.049 1.049
## s2|t6 1.398 1.398 1.398
## s3|t1 -0.529 -0.529 -0.529
## s3|t2 -0.088 -0.088 -0.088
## s3|t3 0.207 0.207 0.207
## s3|t4 0.535 0.535 0.535
## s3|t5 0.766 0.766 0.766
## s3|t6 1.077 1.077 1.077
## s4|t1 -1.398 -1.398 -1.398
## s4|t2 -0.952 -0.952 -0.952
## s4|t3 -0.535 -0.535 -0.535
## s4|t4 -0.169 -0.169 -0.169
## s4|t5 0.279 0.279 0.279
## s4|t6 0.833 0.833 0.833
## s5|t1 -1.040 -1.040 -1.040
## s5|t2 -0.676 -0.676 -0.676
## s5|t3 -0.375 -0.375 -0.375
## s5|t4 0.024 0.024 0.024
## s5|t5 0.409 0.409 0.409
## s5|t6 0.848 0.848 0.848
## s6|t1 -2.071 -2.071 -2.071
## s6|t2 -1.770 -1.770 -1.770
## s6|t3 -1.357 -1.357 -1.357
## s6|t4 -0.952 -0.952 -0.952
## s6|t5 -0.553 -0.553 -0.553
## s6|t6 0.056 0.056 0.056
## s7|t1 -0.887 -0.887 -0.887
## s7|t2 -0.504 -0.504 -0.504
## s7|t3 -0.175 -0.175 -0.175
## s7|t4 0.062 0.062 0.062
## s7|t5 0.301 0.301 0.301
## s7|t6 0.547 0.547 0.547
## m1|t1 -0.952 -0.952 -0.952
## m1|t2 -0.516 -0.516 -0.516
## m1|t3 -0.035 -0.035 -0.035
## m1|t4 0.523 0.523 0.523
## m1|t5 0.960 0.960 0.960
## m1|t6 1.330 1.330 1.330
## m2|t1 -1.384 -1.384 -1.384
## m2|t2 -1.096 -1.096 -1.096
## m2|t3 -0.840 -0.840 -0.840
## m2|t4 -0.541 -0.541 -0.541
## m2|t5 -0.148 -0.148 -0.148
## m2|t6 0.456 0.456 0.456
## m3|t1 -1.157 -1.157 -1.157
## m3|t2 -0.848 -0.848 -0.848
## m3|t3 -0.444 -0.444 -0.444
## m3|t4 -0.003 -0.003 -0.003
## m3|t5 0.474 0.474 0.474
## m3|t6 1.021 1.021 1.021
## m4|t1 -1.413 -1.413 -1.413
## m4|t2 -1.167 -1.167 -1.167
## m4|t3 -0.855 -0.855 -0.855
## m4|t4 -0.427 -0.427 -0.427
## m4|t5 0.088 0.088 0.088
## m4|t6 0.759 0.759 0.759
## m5|t1 -1.256 -1.256 -1.256
## m5|t2 -0.927 -0.927 -0.927
## m5|t3 -0.604 -0.604 -0.604
## m5|t4 -0.246 -0.246 -0.246
## m5|t5 0.191 0.191 0.191
## m5|t6 0.752 0.752 0.752
## m6|t1 -1.233 -1.233 -1.233
## m6|t2 -0.810 -0.810 -0.810
## m6|t3 -0.486 -0.486 -0.486
## m6|t4 -0.040 -0.040 -0.040
## m6|t5 0.462 0.462 0.462
## m6|t6 1.012 1.012 1.012
## m7|t1 -1.293 -1.293 -1.293
## m7|t2 -0.952 -0.952 -0.952
## m7|t3 -0.636 -0.636 -0.636
## m7|t4 -0.158 -0.158 -0.158
## m7|t5 0.375 0.375 0.375
## m7|t6 0.960 0.960 0.960
## p1|t1 -2.385 -2.385 -2.385
## p1|t2 -1.823 -1.823 -1.823
## p1|t3 -1.330 -1.330 -1.330
## p1|t4 -0.879 -0.879 -0.879
## p1|t5 -0.307 -0.307 -0.307
## p1|t6 0.290 0.290 0.290
## p2|t1 -1.489 -1.489 -1.489
## p2|t2 -0.855 -0.855 -0.855
## p2|t3 -0.421 -0.421 -0.421
## p2|t4 0.051 0.051 0.051
## p2|t5 0.579 0.579 0.579
## p2|t6 1.222 1.222 1.222
## p3|t1 -1.614 -1.614 -1.614
## p3|t2 -0.969 -0.969 -0.969
## p3|t3 -0.421 -0.421 -0.421
## p3|t4 0.067 0.067 0.067
## p3|t5 0.803 0.803 0.803
## p3|t6 1.357 1.357 1.357
## p4|t1 -1.654 -1.654 -1.654
## p4|t2 -1.096 -1.096 -1.096
## p4|t3 -0.731 -0.731 -0.731
## p4|t4 -0.268 -0.268 -0.268
## p4|t5 0.268 0.268 0.268
## p4|t6 0.817 0.817 0.817
## p5|t1 -2.071 -2.071 -2.071
## p5|t2 -1.489 -1.489 -1.489
## p5|t3 -0.995 -0.995 -0.995
## p5|t4 -0.468 -0.468 -0.468
## p5|t5 0.137 0.137 0.137
## p5|t6 0.895 0.895 0.895
## p6|t1 -1.852 -1.852 -1.852
## p6|t2 -1.370 -1.370 -1.370
## p6|t3 -0.969 -0.969 -0.969
## p6|t4 -0.604 -0.604 -0.604
## p6|t5 -0.008 -0.008 -0.008
## p6|t6 0.498 0.498 0.498
## p7|t1 -1.852 -1.852 -1.852
## p7|t2 -1.318 -1.318 -1.318
## p7|t3 -1.004 -1.004 -1.004
## p7|t4 -0.468 -0.468 -0.468
## p7|t5 0.040 0.040 0.040
## p7|t6 0.610 0.610 0.610
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .s1 0.496 0.496 0.496
## .s2 0.505 0.505 0.505
## .s3 0.358 0.358 0.358
## .s4 0.521 0.521 0.521
## .s5 0.467 0.467 0.467
## .s6 0.529 0.529 0.529
## .s7 0.541 0.541 0.541
## .m1 0.420 0.420 0.420
## .m2 0.263 0.263 0.263
## .m3 0.264 0.264 0.264
## .m4 0.285 0.285 0.285
## .m5 0.214 0.214 0.214
## .m6 0.300 0.300 0.300
## .m7 0.190 0.190 0.190
## .p1 0.598 0.598 0.598
## .p2 0.603 0.603 0.603
## .p3 0.518 0.518 0.518
## .p4 0.560 0.560 0.560
## .p5 0.305 0.305 0.305
## .p6 0.518 0.518 0.518
## .p7 0.461 0.461 0.461
## sex 0.504 0.030 16.996 0.000 1.000 1.000
## moral 0.580 0.023 25.449 0.000 1.000 1.000
## pathogen 0.402 0.034 11.986 0.000 1.000 1.000
##
## Scales y*:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## s1 1.000 1.000 1.000
## s2 1.000 1.000 1.000
## s3 1.000 1.000 1.000
## s4 1.000 1.000 1.000
## s5 1.000 1.000 1.000
## s6 1.000 1.000 1.000
## s7 1.000 1.000 1.000
## m1 1.000 1.000 1.000
## m2 1.000 1.000 1.000
## m3 1.000 1.000 1.000
## m4 1.000 1.000 1.000
## m5 1.000 1.000 1.000
## m6 1.000 1.000 1.000
## m7 1.000 1.000 1.000
## p1 1.000 1.000 1.000
## p2 1.000 1.000 1.000
## p3 1.000 1.000 1.000
## p4 1.000 1.000 1.000
## p5 1.000 1.000 1.000
## p6 1.000 1.000 1.000
## p7 1.000 1.000 1.000
chisq | df | pvalue | cfi | rmsea | tli | gfi | |
---|---|---|---|---|---|---|---|
x | 1778.762 | 642 | 0 | 0.986 | 0.059 | 0.991 | 0.985 |
We test invariance via the comparison of the configural model and the metric model, and also using the an automated function within semTools R package.
On both tests, metric invariance across gender is not attained. That is, we reject the null hypothesis that the metric model fits the data as good as the metric model.
This is also evident by the lower AIC and BIC. What we should now know is examine whether there is partial invariance. We can do this by using the lavTestScore() function in the lavaan function. This function allows us to see the effect of releasing equality constraints across the groups. The modindices() will only show modification indices for newly added parameters associated with new paths. These parameters aren’t newly estimated in the model, we’re just freeing them across groups.
However, note that the “delta” in the results table below stands for the difference between goodness of fit between configural and metric models. However, if one abides by the \(\delta _{GOF}\) method, weak invariance (equality of loadings) could be established.
We will proceed with partial invariance tests, and also test for MI using IRT with mirt R package, but this is a good enough indication.
## Scaled Chi Square Difference Test (method = "satorra.2000")
##
## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
## configural 624 1621.1
## metric 642 1778.8 57.476 12.907 1.372e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
via semTools
##
## Measurement invariance models:
##
## Model 1 : fit.configural
## Model 2 : fit.loadings
## Model 3 : fit.intercepts
## Model 4 : fit.means
##
## Chi Square Difference Test
##
## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
## fit.configural 372 75213 75863 1361.3
## fit.loadings 390 75230 75792 1414.6 53.303 18 2.355e-05 ***
## fit.intercepts 408 75499 75972 1719.3 304.758 18 < 2.2e-16 ***
## fit.means 411 75689 76147 1915.4 196.089 3 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Fit measures:
##
## cfi rmsea cfi.delta rmsea.delta
## fit.configural 0.910 0.072 NA NA
## fit.loadings 0.907 0.072 0.003 0.000
## fit.intercepts 0.881 0.079 0.026 0.008
## fit.means 0.863 0.085 0.018 0.005
We proceeded with Partial Measurement invariance. Results indicate that items P4, P6, S2, S3, and M3, M4 were the biggest contributors to invariance. With exception of M4, these were freed. The metric model achieved good enough fit without M4, but the item is still invariant.
In addition, we also freed P1, S1, and M1 to see if they are invariant. We choose the least invariant items in the previous mode (P5, S5, M7) and found no evidence these are invariant.
lhs | op | rhs | X2 | df | p.value | |
---|---|---|---|---|---|---|
15 | .p18. | == | .p237. | 31.658 | 1 | 0.000 |
1 | .p2. | == | .p221. | 24.470 | 1 | 0.000 |
17 | .p20. | == | .p239. | 23.313 | 1 | 0.000 |
8 | .p10. | == | .p229. | 22.719 | 1 | 0.000 |
2 | .p3. | == | .p222. | 10.908 | 1 | 0.001 |
9 | .p11. | == | .p230. | 8.529 | 1 | 0.003 |
7 | .p9. | == | .p228. | 7.048 | 1 | 0.008 |
18 | .p21. | == | .p240. | 6.397 | 1 | 0.011 |
11 | .p13. | == | .p232. | 6.214 | 1 | 0.013 |
14 | .p17. | == | .p236. | 6.076 | 1 | 0.014 |
3 | .p4. | == | .p223. | 3.561 | 1 | 0.059 |
6 | .p7. | == | .p226. | 2.227 | 1 | 0.136 |
10 | .p12. | == | .p231. | 1.769 | 1 | 0.184 |
16 | .p19. | == | .p238. | 0.942 | 1 | 0.332 |
13 | .p16. | == | .p235. | 0.914 | 1 | 0.339 |
12 | .p14. | == | .p233. | 0.883 | 1 | 0.347 |
5 | .p6. | == | .p225. | 0.276 | 1 | 0.600 |
4 | .p5. | == | .p224. | 0.135 | 1 | 0.713 |
lhs | op | rhs | label | plabel | start | est | se | |
---|---|---|---|---|---|---|---|---|
221 | sex | =~ | s2 | .p2. | .p221. | 0.999 | 0.991 | 0.036 |
222 | sex | =~ | s3 | .p3. | .p222. | 1.106 | 1.129 | 0.033 |
229 | moral | =~ | m3 | .p10. | .p229. | 1.035 | 1.127 | 0.019 |
230 | moral | =~ | m4 | .p11. | .p230. | 1.076 | 1.111 | 0.021 |
237 | pathogen | =~ | p4 | .p18. | .p237. | 0.968 | 1.046 | 0.045 |
239 | pathogen | =~ | p6 | .p20. | .p239. | 0.973 | 1.096 | 0.051 |
## Scaled Chi Square Difference Test (method = "satorra.2000")
##
## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
## configural 624 1621.1
## modified.metric 637 1653.2 12.061 9.376 0.2356
## lavaan (0.5-23.1097) converged normally after 62 iterations
##
## Number of observations per group
## male 553
## female 469
##
## Estimator DWLS Robust
## Minimum Function Test Statistic 1653.238 1252.552
## Degrees of freedom 637 637
## P-value (Chi-square) 0.000 0.000
## Scaling correction factor 1.912
## Shift parameter for each group:
## male 209.950
## female 178.059
## for simple second-order correction (Mplus variant)
##
## Chi-square for each group:
##
## male 860.063 659.711
## female 793.175 592.841
##
## Parameter Estimates:
##
## Information Expected
## Standard Errors Robust.sem
##
##
## Group 1 [male]:
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|)
## sex =~
## s1 1.000
## s2 1.067 0.045 23.517 0.000
## s3 1.187 0.039 30.164 0.000
## s4 (.p4.) 0.977 0.033 29.392 0.000
## s5 (.p5.) 1.029 0.034 30.419 0.000
## s6 (.p6.) 0.970 0.040 24.402 0.000
## s7 (.p7.) 0.954 0.034 28.315 0.000
## moral =~
## m1 1.000
## m2 (.p9.) 1.127 0.020 57.143 0.000
## m3 1.177 0.022 54.258 0.000
## m4 (.11.) 1.110 0.021 52.928 0.000
## m5 (.12.) 1.163 0.020 59.261 0.000
## m6 (.13.) 1.099 0.020 55.306 0.000
## m7 (.14.) 1.181 0.021 56.918 0.000
## pathogen =~
## p1 1.000
## p2 (.16.) 0.992 0.051 19.635 0.000
## p3 (.17.) 1.092 0.049 22.442 0.000
## p4 1.178 0.061 19.395 0.000
## p5 (.19.) 1.310 0.054 24.361 0.000
## p6 1.214 0.065 18.747 0.000
## p7 (.21.) 1.157 0.052 22.370 0.000
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## sex ~~
## moral 0.213 0.021 10.132 0.000
## pathogen 0.178 0.019 9.499 0.000
## moral ~~
## pathogen 0.217 0.019 11.305 0.000
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|)
## .s1 0.000
## .s2 0.000
## .s3 0.000
## .s4 0.000
## .s5 0.000
## .s6 0.000
## .s7 0.000
## .m1 0.000
## .m2 0.000
## .m3 0.000
## .m4 0.000
## .m5 0.000
## .m6 0.000
## .m7 0.000
## .p1 0.000
## .p2 0.000
## .p3 0.000
## .p4 0.000
## .p5 0.000
## .p6 0.000
## .p7 0.000
## sex 0.000
## moral 0.000
## pathogen 0.000
##
## Thresholds:
## Estimate Std.Err z-value P(>|z|)
## s1|t1 -0.954
## s1|t2 -0.292
## s1|t3 0.111
## s1|t4 0.620
## s1|t5 1.143
## s1|t6 1.733
## s2|t1 0.052
## s2|t2 0.493
## s2|t3 0.859
## s2|t4 1.306
## s2|t5 1.622
## s2|t6 2.020
## s3|t1 -0.075
## s3|t2 0.452
## s3|t3 0.833
## s3|t4 1.215
## s3|t5 1.485
## s3|t6 1.753
## s4|t1 -0.833
## s4|t2 -0.268
## s4|t3 0.180
## s4|t4 0.642
## s4|t5 1.117
## s4|t6 1.733
## s5|t1 -0.437
## s5|t2 0.130
## s5|t3 0.524
## s5|t4 0.892
## s5|t5 1.264
## s5|t6 1.674
## s6|t1 -0.807
## s6|t2 -0.369
## s6|t3 0.011
## s6|t4 0.452
## s6|t5 0.846
## s6|t6 1.254
## s7|t1 -0.364
## s7|t2 -0.007
## s7|t3 0.273
## s7|t4 0.693
## s7|t5 1.109
## s7|t6 1.499
## m1|t1 -0.912
## m1|t2 -0.335
## m1|t3 0.130
## m1|t4 0.545
## m1|t5 1.084
## m1|t6 1.605
## m2|t1 -1.384
## m2|t2 -1.084
## m2|t3 -0.814
## m2|t4 -0.393
## m2|t5 0.093
## m2|t6 0.899
## m3|t1 -1.068
## m3|t2 -0.615
## m3|t3 -0.198
## m3|t4 0.301
## m3|t5 0.839
## m3|t6 1.472
## m4|t1 -1.446
## m4|t2 -1.117
## m4|t3 -0.625
## m4|t4 -0.185
## m4|t5 0.354
## m4|t6 1.117
## m5|t1 -1.160
## m5|t2 -0.801
## m5|t3 -0.566
## m5|t4 -0.079
## m5|t5 0.519
## m5|t6 1.028
## m6|t1 -1.215
## m6|t2 -0.820
## m6|t3 -0.427
## m6|t4 0.093
## m6|t5 0.699
## m6|t6 1.316
## m7|t1 -1.254
## m7|t2 -0.820
## m7|t3 -0.473
## m7|t4 -0.016
## m7|t5 0.665
## m7|t6 1.338
## p1|t1 -2.296
## p1|t2 -1.656
## p1|t3 -1.125
## p1|t4 -0.620
## p1|t5 -0.025
## p1|t6 0.710
## p2|t1 -1.472
## p2|t2 -0.865
## p2|t3 -0.320
## p2|t4 0.171
## p2|t5 0.905
## p2|t6 1.528
## p3|t1 -1.753
## p3|t2 -0.846
## p3|t3 -0.311
## p3|t4 0.282
## p3|t5 1.044
## p3|t6 1.797
## p4|t1 -1.820
## p4|t2 -1.134
## p4|t3 -0.598
## p4|t4 -0.148
## p4|t5 0.498
## p4|t6 1.100
## p5|t1 -2.056
## p5|t2 -1.264
## p5|t3 -0.693
## p5|t4 -0.185
## p5|t5 0.508
## p5|t6 1.361
## p6|t1 -1.656
## p6|t2 -0.919
## p6|t3 -0.508
## p6|t4 -0.029
## p6|t5 0.631
## p6|t6 1.274
## p7|t1 -1.775
## p7|t2 -1.264
## p7|t3 -0.865
## p7|t4 -0.354
## p7|t5 0.335
## p7|t6 0.983
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .s1 0.474
## .s2 0.401
## .s3 0.258
## .s4 0.497
## .s5 0.443
## .s6 0.505
## .s7 0.521
## .m1 0.456
## .m2 0.309
## .m3 0.247
## .m4 0.330
## .m5 0.264
## .m6 0.343
## .m7 0.241
## .p1 0.658
## .p2 0.664
## .p3 0.593
## .p4 0.526
## .p5 0.414
## .p6 0.496
## .p7 0.543
## sex 0.526 0.032 16.384 0.000
## moral 0.544 0.022 24.571 0.000
## pathogen 0.342 0.029 11.909 0.000
##
## Scales y*:
## Estimate Std.Err z-value P(>|z|)
## s1 1.000
## s2 1.000
## s3 1.000
## s4 1.000
## s5 1.000
## s6 1.000
## s7 1.000
## m1 1.000
## m2 1.000
## m3 1.000
## m4 1.000
## m5 1.000
## m6 1.000
## m7 1.000
## p1 1.000
## p2 1.000
## p3 1.000
## p4 1.000
## p5 1.000
## p6 1.000
## p7 1.000
##
##
## Group 2 [female]:
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|)
## sex =~
## s1 1.000
## s2 0.885 0.044 20.006 0.000
## s3 1.049 0.041 25.563 0.000
## s4 (.p4.) 0.977 0.033 29.392 0.000
## s5 (.p5.) 1.029 0.034 30.419 0.000
## s6 (.p6.) 0.970 0.040 24.402 0.000
## s7 (.p7.) 0.954 0.034 28.315 0.000
## moral =~
## m1 1.000
## m2 (.p9.) 1.127 0.020 57.143 0.000
## m3 1.071 0.022 47.654 0.000
## m4 (.11.) 1.110 0.021 52.928 0.000
## m5 (.12.) 1.163 0.020 59.261 0.000
## m6 (.13.) 1.099 0.020 55.306 0.000
## m7 (.14.) 1.181 0.021 56.918 0.000
## pathogen =~
## p1 1.000
## p2 (.16.) 0.992 0.051 19.635 0.000
## p3 (.17.) 1.092 0.049 22.442 0.000
## p4 0.914 0.051 17.790 0.000
## p5 (.19.) 1.310 0.054 24.361 0.000
## p6 0.971 0.060 16.235 0.000
## p7 (.21.) 1.157 0.052 22.370 0.000
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## sex ~~
## moral 0.216 0.026 8.452 0.000
## pathogen 0.307 0.024 12.693 0.000
## moral ~~
## pathogen 0.162 0.025 6.536 0.000
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|)
## .s1 0.000
## .s2 0.000
## .s3 0.000
## .s4 0.000
## .s5 0.000
## .s6 0.000
## .s7 0.000
## .m1 0.000
## .m2 0.000
## .m3 0.000
## .m4 0.000
## .m5 0.000
## .m6 0.000
## .m7 0.000
## .p1 0.000
## .p2 0.000
## .p3 0.000
## .p4 0.000
## .p5 0.000
## .p6 0.000
## .p7 0.000
## sex 0.000
## moral 0.000
## pathogen 0.000
##
## Thresholds:
## Estimate Std.Err z-value P(>|z|)
## s1|t1 -1.200
## s1|t2 -0.795
## s1|t3 -0.404
## s1|t4 -0.024
## s1|t5 0.450
## s1|t6 0.944
## s2|t1 -0.273
## s2|t2 0.175
## s2|t3 0.392
## s2|t4 0.738
## s2|t5 1.049
## s2|t6 1.398
## s3|t1 -0.529
## s3|t2 -0.088
## s3|t3 0.207
## s3|t4 0.535
## s3|t5 0.766
## s3|t6 1.077
## s4|t1 -1.398
## s4|t2 -0.952
## s4|t3 -0.535
## s4|t4 -0.169
## s4|t5 0.279
## s4|t6 0.833
## s5|t1 -1.040
## s5|t2 -0.676
## s5|t3 -0.375
## s5|t4 0.024
## s5|t5 0.409
## s5|t6 0.848
## s6|t1 -2.071
## s6|t2 -1.770
## s6|t3 -1.357
## s6|t4 -0.952
## s6|t5 -0.553
## s6|t6 0.056
## s7|t1 -0.887
## s7|t2 -0.504
## s7|t3 -0.175
## s7|t4 0.062
## s7|t5 0.301
## s7|t6 0.547
## m1|t1 -0.952
## m1|t2 -0.516
## m1|t3 -0.035
## m1|t4 0.523
## m1|t5 0.960
## m1|t6 1.330
## m2|t1 -1.384
## m2|t2 -1.096
## m2|t3 -0.840
## m2|t4 -0.541
## m2|t5 -0.148
## m2|t6 0.456
## m3|t1 -1.157
## m3|t2 -0.848
## m3|t3 -0.444
## m3|t4 -0.003
## m3|t5 0.474
## m3|t6 1.021
## m4|t1 -1.413
## m4|t2 -1.167
## m4|t3 -0.855
## m4|t4 -0.427
## m4|t5 0.088
## m4|t6 0.759
## m5|t1 -1.256
## m5|t2 -0.927
## m5|t3 -0.604
## m5|t4 -0.246
## m5|t5 0.191
## m5|t6 0.752
## m6|t1 -1.233
## m6|t2 -0.810
## m6|t3 -0.486
## m6|t4 -0.040
## m6|t5 0.462
## m6|t6 1.012
## m7|t1 -1.293
## m7|t2 -0.952
## m7|t3 -0.636
## m7|t4 -0.158
## m7|t5 0.375
## m7|t6 0.960
## p1|t1 -2.385
## p1|t2 -1.823
## p1|t3 -1.330
## p1|t4 -0.879
## p1|t5 -0.307
## p1|t6 0.290
## p2|t1 -1.489
## p2|t2 -0.855
## p2|t3 -0.421
## p2|t4 0.051
## p2|t5 0.579
## p2|t6 1.222
## p3|t1 -1.614
## p3|t2 -0.969
## p3|t3 -0.421
## p3|t4 0.067
## p3|t5 0.803
## p3|t6 1.357
## p4|t1 -1.654
## p4|t2 -1.096
## p4|t3 -0.731
## p4|t4 -0.268
## p4|t5 0.268
## p4|t6 0.817
## p5|t1 -2.071
## p5|t2 -1.489
## p5|t3 -0.995
## p5|t4 -0.468
## p5|t5 0.137
## p5|t6 0.895
## p6|t1 -1.852
## p6|t2 -1.370
## p6|t3 -0.969
## p6|t4 -0.604
## p6|t5 -0.008
## p6|t6 0.498
## p7|t1 -1.852
## p7|t2 -1.318
## p7|t3 -1.004
## p7|t4 -0.468
## p7|t5 0.040
## p7|t6 0.610
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .s1 0.469
## .s2 0.584
## .s3 0.416
## .s4 0.493
## .s5 0.438
## .s6 0.500
## .s7 0.516
## .m1 0.412
## .m2 0.254
## .m3 0.325
## .m4 0.276
## .m5 0.205
## .m6 0.290
## .m7 0.180
## .p1 0.573
## .p2 0.580
## .p3 0.491
## .p4 0.643
## .p5 0.267
## .p6 0.597
## .p7 0.429
## sex 0.531 0.032 16.851 0.000
## moral 0.588 0.023 25.670 0.000
## pathogen 0.427 0.035 12.221 0.000
##
## Scales y*:
## Estimate Std.Err z-value P(>|z|)
## s1 1.000
## s2 1.000
## s3 1.000
## s4 1.000
## s5 1.000
## s6 1.000
## s7 1.000
## m1 1.000
## m2 1.000
## m3 1.000
## m4 1.000
## m5 1.000
## m6 1.000
## m7 1.000
## p1 1.000
## p2 1.000
## p3 1.000
## p4 1.000
## p5 1.000
## p6 1.000
## p7 1.000
IRT is a collection of latent ability/trait models. These techniques are designed to model the non-linear relationship between the probability of endorsing a given answer on a survey question and the underlying latent trait assumed to lead choices. “Latent” models are based on the assumption the probability of a response in a given question can be linked to item and person parameters via a mathematical non-linear function (most commonly, the logit function is used, but probit is not uncommon). The aim of such models is to measure the underlying trait (or ability) producing the test results \(T(\theta)\) (or performance). \(T(\theta)\) is known as Test scores.
Often, IRT models are classified with regards to the number of parameters to be estimated (item difficulty, discrimination, guessing and others) and the nature of the data (dichotomous vs. polytomous items). For ordered polytomous data, such as TDDS items, the Graded Response Model, Nominal Model, Rating Scale Model and Partial Credit Model are most commonly used. There are several aspects to consider before choosing for the appropriate model, but since we are interested in how each item is able to discriminate across trait-levels, we rely on the Graded Response Model (GRM). Here is a minimalist account of the GRM and its statistical properties. The GRM is based on a cumulative log-odds principle, in which the probabilities of choosing a given response category within each item is modeled as differences between cumulative probabilities. For example, for each Pathogen item, there are seven possible response categories, going from not disgusted to [category 1] to extremely disgusted [category 7]. So, differences in cumulative probabilities can be understood as modelling the probability of choosing a particular response category (or a lower one) compared to the probability of choosing a higher response category. For the items of the Pathogen scale this means comparing —- differences: {1 vs. 2,3,4,5,6,7}, or {1,2 vs. 3,4,5,6,7}, or {1,2,3 vs. 4,5,6,7}, {1,2,3,4 vs. 5,6,7} and so on. These are often termed category thresholds. To represent these graphically, we use Item trace lines. They show the probability of choosing a response category (y-axis), given a latent trait level (x-axis), which is mapped in a continuum and it is (usually) invariant and independent of the sample.
Broadly speaking, the IRT analyses we perform here has three levels:
We start with the scale-level and dig deeper as we go along.
Below, you will find 3 graphs depicting Test Scores \(T(\theta)\) for each subscale of TDDS. These graphs can be interpreted as the estimated relationship between a person’s test scores \(T(\theta)\) and latent trait being measured \(\theta\).
Looking at the Test scores, results suggests that for the Pathogen subscale, scores about 10 in the overall test/scale yields a \(\theta\) level of about - 3. Alternatively, a score of about 33 is associated with an average \(\theta\) of zero. And so on. In this sense, these graphs contain the same information as in classical approaches (as in Table 6).
An important observation. Although most of the following IRT graphs show the scale of x-axis to be from - 6\((\theta)\) to + 6\((\theta)\), it actually goes to infinity. The R-package being used (“mirt”) displays by default from - 6\((\theta)\) to + 6\((\theta)\), others (like “ltm”) tend to display from - 4\((\theta)\) to + 4\((\theta)\). This is non-relevant because IRT maps total scores on a continuum of the latent trait. So, it is just a question of the tool being used and/or preference.
Another important aspect of IRT is the concept of Information, which is related to the accuracy with which we can estimate ability. “The concept of information is used in IRT to reflect how precisely an item or scale can measure the underlying trait. Greater information is associated with greater measurement precision. Information is inversely related to the standard error of the estimate, so, at any theta, greater information will result in a smaller standard error associated with the estimated theta score. Over the range of the underlying trait, an information function curve can be derived for each item to reveal how measurement precision can vary across different levels of the trait [@nguyen2014introduction].”
Below, there are 3 sets of graphs relevant for the TDDS: The test information functions (TIF), the Test Reliability Function, and the Test Information Functions with the Standard Error of Measurement.
Below, for each subscale, the Test Information is plotted, providing an indication of test ability to yield information (i.e., measure accurately) across the latent construct \(\theta\). In other words, Test Information provides an indication of the instrument’s ability to differentiate among respondents. In fact, Information is one of the major contributions of item response theory to psychometrics, and it is a de facto extension of the concept of reliability.
Traditionally, reliability is measured using a single index for all possible test scores - i.e., the ratio of true and observed score variance. While single indices are helpful in characterizing a test’s average reliability, IRT advances the idea that true reliability (or information) is actually not uniform across the entire range of test scores. For example, it is common to see scores at the edges of the test’s range with more error than scores closer to the middle of the range, because more test takers show an average score on the latent trait (which we see for TDDS as well).
Specifically to TDDS, we see that the Pathogen subscale is able to convey information through a larger “range” than its sexual and moral counterparts. We also see that Moral scale is very informative in the middle of the latent construct it attempts to measure (i.e., most variance). If, in making this scale, the objective was to separate persons in two groups, this subscale does its job. However, the sub-scales can do a better job in measuring the full extent of the latent variable. We also see, specially comparatively, that Pathogen and Sexual subscales are less informative in the middle, while being able to gauge more information across the latent construct.
The Test Reliability are estimated per \(\theta\) level. However, it is nonetheless more of a measure of internal consistency than ‘reliability’ as is understood within classical approaches. If you are interested, please see this post for some more information on the subject. You could also look at the first slides of Templin, which is also a very interesting source. In any case, to summarize, the idea of reliability is highly dependent on dimensionality and number of items, whereas information isn’t.
In these plots we see that the standard error of estimation is the reciprocal of the test information at a given trait level, which is to say the more information, the less error of measurement (a notion that is absent in classical approaches). This is so because the model is estimating person parameters for each ability level, and because each of these are estimates have sampling error which is then captured by the \(SE(\theta)\). The plots show that as information increases the \(SE(\theta)\) decreases.
Section takeaway
My suggestion would be to improve the information yielded by each subscale across a wider range of the latent construct. This could be achieved with the development of items that are discriminatory at non-central regions of \(\theta\) level while simultaneously dropping non-informative and redundant items. In order to identify problematic items, we analyse the item-level characteristics further down.
Item Information Curves (IIC)
In IRT we can see the information of a set of items (instrument/scale) as well as for each item within a scale. And this is what is done in the next two pages, an in-depth analysis at the item-level.
The Item Information Curve shows how well and precisely each item measures the latent trait at various levels of the attribute. Certain items may provide more information at low levels of the attribute, while others may provide more information at higher levels of the latent trait.In the first page we see, for each subscale, the information each item is able to yield and the region within the latent construct each item is able to gauge.
Results
Results show that for the Pathogen subscale, item 15 is the most informative, followed by item 21, 12, 09. As for the Moral subscale items 19 and 13 are the most informative (followed by 4, 7 10), while for the sexual subscale, item 8 is clearly the most informative item, followed by 5 and 14. Note that for the first set of graphs (colorful) the y-axis differ. This was done as to better visualize the item curves. Immediately below you will find a more fair comparison between items, which are on the same scale (in blue-ish).
While comparisons of ‘quantity’ of yielded information are best visualized in the first part, comparisons in terms of ranges (in my opinion) of the construct is best depicted in the second page. In any case, both figures depict the same information. I find that the first page is particularly interesting if we are interested in knowing which items measure the extremes of the latent construct. Then the first page is able to tell us that item 9 measures levels of \(\theta\) up to + 3 while item 3 measures as low as - 4. Ideally, one wants to measure through a larger range of the latent construct, with a higher precision (information).
Section takeaway
Based on these analyses, we see that some items likely measure similar regions/ranges of the latent construct. In this sense, within IRT, these items may be considered redundant, for example: items 13 & 19 and 4 & 10 for the moral subscale. This means that (usually) responses to these items tend to be the same, given a respondent’s trait level. While this increases a scale’s consistency, perhaps one may want test/re-test items taping into areas others included items don’t. Ideally, with scale development, these items would be substituted by items that can be informative at other ranges of the latent construct.
In order to have an accurate idea of how each item fared in this sample, we investigate 3rd level below - the response category level - which can give many insights about the above discussed results.
Now using the same scale (y-axis) for all Disgust Sensitivity Instruments.
Category response curves: items 15 and 17 as contrasting examples
In the next page, we give special attention to item 15, of the Pathogen subscale, as a mean to elicit important concepts.
In the next page the associated probabilities and latent level \(\theta\) for each response categories for [P] TDDS 15 is displayed. Accordingly, in the first figure of the next page, each curve represents a response category. Note that each curve has its own peak at a given region of \(\theta\). Which is why this item was chosen, as the response categories are well distributed. Each of those peaks can be understood as the latent trait \(\theta\) region for which the response category is more probable.
This means the response categories of item 15 can discriminate well across the continuum of the latent construct. A person whose estimated \(\theta\) value is about -2\(\theta\) would likely choose response category P2 (“Slightly Disgusted”) rather than P1 (“Not Disgusted”) or P3 (“Somewhat Disgusted”). This is an ad-hoc explanation of the concept of discrimination & information, which are intimately related. At the lowest level, discrimination and information translates into non-overlapping response categories. Or otherwise said, where there is a latent trait region in which each individual response category is more probable.
The relationship between the discrimination (in response categories) and the resulting yielded information is pedagogically shown in the bottom figure entitled “Trace line & information for item 17”. There it is shown why item 17 yields so little information: because the response categories overlap completely, which is to say none of the response categories is able to discriminate among the respondents’ latent trait levels.
Category response curves
Starting with the Sex sub-scale, we see why item 8 is the ‘best’ of that scale. Its response categories can discriminate - better than other items - between the Sex latent levels. Items 5 and 14 perform in a similar way. Other items, unfortunately, do not perform as well in discrimination or range. It seems that for this scale, the items are felt/perceived as dichotomous, rather than graded or ordinal.
This is not the case for the Moral sub-scale, where all items show a better performance than the Sex sub-scale because they are better at discriminating between individuals as the curves are peaked and dispersed across all levels of the latent trait. Item 1 is out-performed by all other items, and might profit from some changes or removal for a better item. The items on this sub-scale are, however, measuring all the same range. On the same difficulty, it this was ability. That is to say, this sub-scale’s information would profit from items that are hard to endorse, and that are really easy to endorse. It is counter intuitive, but items specifically designed with their ‘difficulty’ in mind could potentially add to this scale’s accuracy in measuring its latent construct.
As for the Pathogen sub-scale, item 15 stands out. And the other items are similarly discriminating and cover about the same range. I would suggest that item 6 and 9 are changed/removed for better items, as these are completely encompassed by the curves of other items.
Model fit of U-IRT
All uni-dimensional IRT models were identified.
The model fit for IRT models is assessed via the statistic M2. M2 (or rather M2* which is the appropriate statistic for polytomous data) has the unfortunate property that it cannot be computed if the number of items is too small. Here, 7 items probably doesn’t give enough degrees of freedom relative to the number of estimated parameters. So, M2 cannot be calculated for neither of the uni-dimensional IRT models (U-IRT or UIRT).
As to assess overall fit, we will use the Root Mean Square Error of Approximation (RMSEA) as a proxy statistic to determine fit to the of our U-IRTs. Caution must however be heeded because this ad-hoc way doesn’t yield exactly the same meaning because of the ‘parsimony’ aspect of the model is not being included (i.e., the df are fixed, unlike in M2) [See discussion here]. In any case, we find the following RMSEA for each UIRT.
Based on this ad-hoc proxy for fit, we see that the overall fit for Sex sub-scale is not ideal. We investigate further by looking at item fit.
Pathogen UIRT | Sex UIRT | Moral UIRT | |
---|---|---|---|
RMSEA | 0.024 | 0.072 | 0.018 |
Item fit
We proceed with item fit indices to check whether there are any divergent items in the UIRT models according to the S-X2 statistic. Item fit is determined by investigating the differences between the observed and expected proportions of item scores, wherein large residuals would indicate misfit. The S-X2 test, adjusted for the model-dependent observed proportion, was used for assessing the goodness of fit of each item (i.e., discrepancy of model’s prediction for each item and the observed data).
Item | S-X2 Statistic | df of S-X2 | p-value of S-X2 | adjusted p-value of S-X2 |
---|---|---|---|---|
[P] TDDS_3 | 116.213 | 103 | 0.176 | 0.206 |
[P] TDDS_6 | 140.148 | 116 | 0.063 | 0.110 |
[P] TDDS_9 | 120.928 | 105 | 0.137 | 0.169 |
[P] TDDS_12 | 158.780 | 114 | 0.004 | 0.019 |
[P] TDDS_15 | 101.835 | 82 | 0.068 | 0.110 |
[P] TDDS_18 | 133.030 | 112 | 0.085 | 0.128 |
[P] TDDS_21 | 106.315 | 108 | 0.528 | 0.545 |
[M] TDDS_1 | 172.123 | 117 | 0.001 | 0.006 |
[M] TDDS_4 | 143.894 | 95 | 0.001 | 0.006 |
[M] TDDS_7 | 136.351 | 98 | 0.006 | 0.022 |
[M] TDDS_10 | 119.796 | 92 | 0.027 | 0.072 |
[M] TDDS_13 | 113.704 | 88 | 0.034 | 0.079 |
[M] TDDS_16 | 145.482 | 105 | 0.005 | 0.022 |
[M] TDDS_19 | 103.394 | 81 | 0.047 | 0.093 |
[S] TDDS_2 | 150.409 | 132 | 0.130 | 0.169 |
[S] TDDS_5 | 156.223 | 121 | 0.017 | 0.051 |
[S] TDDS_8 | 181.746 | 107 | 0.000 | 0.000 |
[S] TDDS_11 | 134.493 | 137 | 0.545 | 0.545 |
[S] TDDS_14 | 160.092 | 132 | 0.048 | 0.093 |
[S] TDDS_17 | 162.254 | 141 | 0.106 | 0.149 |
[S] TDDS_20 | 135.983 | 136 | 0.484 | 0.535 |
Results show that item 4 from the Pathogen sub-scale (TDDS_12), items 1, 2, 3, and 6 from the Moral sub-scale, and item 3 (TDDS_8) from the Sex sub-scale, present fit issues, even after controlling for false discovery rate (adjusted p-values).
It is quite puzzling that 4 items of the Moral sub-scale do not fit. This can stem from several factors (violations of the uni-dimensionality, response categories being estimated out of expected order, discrepancy between expected vs. observed responses, DIF, correlated residuals).
We investigated these several factors, and we found that the main problem with lack of fit stemmed from high SE (discrepancies in the expected vs. observed frequencies of responses). When inspecting these we found that the highest SEs were associated with responses that could be linked to ‘patterners’ - e.g., sum scores equal to 7, or equal to 49, etc. After filtering out (excluding) these observations which were flagged in Section 1 as patterners, the fit improved considerably, as shown in the below table.
Item | S-X2 Statistic | df of S-X2 | p-value of S-X2 | adjusted p-value of S-X2 | [Filtered] adj. p-val S-X2 |
---|---|---|---|---|---|
[P] TDDS_3 | 116.213 | 103 | 0.176 | 0.206 | 0.168 |
[P] TDDS_6 | 140.148 | 116 | 0.063 | 0.110 | 0.323 |
[P] TDDS_9 | 120.928 | 105 | 0.137 | 0.169 | 0.394 |
[P] TDDS_12 | 158.780 | 114 | 0.004 | 0.019 | 0.087 |
[P] TDDS_15 | 101.835 | 82 | 0.068 | 0.110 | 0.417 |
[P] TDDS_18 | 133.030 | 112 | 0.085 | 0.128 | 0.268 |
[P] TDDS_21 | 106.315 | 108 | 0.528 | 0.545 | 0.763 |
[M] TDDS_1 | 172.123 | 117 | 0.001 | 0.006 | 0.168 |
[M] TDDS_4 | 143.894 | 95 | 0.001 | 0.006 | 0.031 |
[M] TDDS_7 | 136.351 | 98 | 0.006 | 0.022 | 0.146 |
[M] TDDS_10 | 119.796 | 92 | 0.027 | 0.072 | 0.394 |
[M] TDDS_13 | 113.704 | 88 | 0.034 | 0.079 | 0.183 |
[M] TDDS_16 | 145.482 | 105 | 0.005 | 0.022 | 0.146 |
[M] TDDS_19 | 103.394 | 81 | 0.047 | 0.093 | 0.268 |
[S] TDDS_2 | 150.409 | 132 | 0.130 | 0.169 | 0.412 |
[S] TDDS_5 | 156.223 | 121 | 0.017 | 0.051 | 0.168 |
[S] TDDS_8 | 181.746 | 107 | 0.000 | 0.000 | 0.019 |
[S] TDDS_11 | 134.493 | 137 | 0.545 | 0.545 | 0.656 |
[S] TDDS_14 | 160.092 | 132 | 0.048 | 0.093 | 0.181 |
[S] TDDS_17 | 162.254 | 141 | 0.106 | 0.149 | 0.531 |
[S] TDDS_20 | 135.983 | 136 | 0.484 | 0.535 | 0.763 |
I leave below a short description of the results found of the other aspects researched.
Out of order response categories Unlike the other polytomous IRT models, polytomous responses in NRM are unordered (or at least not assumed to be ordered). Even though responses are often coded numerically (for example, 0,1,2,…, m), the values of the responses do not represent some sort of scores on items, but just nominal indications for response categories. Some applications of the NRM are found in uses with multiple choice items, but here we use to check whether there are response categories being estimated out of its expected order. So we fit a nominal IRT model - which does not assume the response categories are order - and plot their trace lines below. Results show that there are indeed some issues in this respect, especially for the Pathogen Disgust subscale.
Violations of the uni-dimensionality. I investigated if the Moral sub-scale could be two-dimensional. It does not seem to be the case. But if one forces a two-factor solution, item 1 and item 7 seem to load on the second dimension for the Moral sub-scale [post-analysis note: these items show DIF]. Out curiosity, I performed the same analysis on the Pathogen sub-scales which yield item 6, 9, 15, 21 in one factor, and 3, 12, 18 in another. As for the Sex sub-scale, items 5, 8, 20 form factor 1, while 2, 11, 14, 17 form the second. Importantly, these do not suggest - by any measure of the imagination - that these constructs would be 2-dim. I am just leaving it here because you may want to check the content of these two items, and whether these solutions inform theoretically you in any way. Especially in conjunction with the results of bootstrapped EGA above.
DIF. Another possible explanation for these fit results might be DTF (Differential Test functioning) and/or DIF (Differential Item functioning). DIF refers to any situation in which an item within a test measures an intended construct differently for one subgroup of a population than it does for another. In short, DIF is item bias, and it happens if two respondents from distinct groups who have equal levels of the latent trait (or theta) do not have the same probability of endorsing a given response for that item. As we investigate DIF below for each of the sub-scales with Uni-dimensional-IRTs, we find that it is a contributor to lack of model fit. Below, we try to answer the following two questions: Given that DIF is a property of an individual item, to what extent does DIF lead to biased total scores (differential test functioning - DTF)? And if so, by how much?
An important aspect in scale development is to assure an instrument measures the same construct across groups of interest [@stegmueller2011apples]. This is often called measurement invariance or measurement equivalence. Here, measurement invariance will be assessed inferentially with Differential Item Function (DIF) and Differential test functioning (DTF), which are the IRT methods for the evaluation of construct equivalence.
DTF
Differential test functioning (DTF) is present when individuals who have the same standing on the latent construct or attribute, but belong to different groups, obtain different scores on the test. The presence of DIF may lead to DTF, but not always. Some DIF items favor the focal group, whereas others may favor the reference group, which may produce a cancelling effect. Ideally, we want a test with no DIF and no DTF.
DIF
Differential item functioning (DIF) is a concept from psychometrics emerging from the application of psychological instruments in different cultures or subcultures. DIF is a statistical characteristic of an item that shows the extent to which items might be biased in measuring the latent construct or ability between sub-groups. DIF occurs when examinees from different groups show differing probabilities of success on (or endorsing) the item after matching on the construct that the item is intended to measure.
Item bias & DIF
Item bias occurs when examinees of one group are less likely to answer an item correctly (or endorse an item) than examinees of another group because of some characteristic of the test item that is not relevant to the construct being measured. The difference with DIF is that analyses of item bias are qualitative in nature via reconstruction of meaning and contextualization. That is to say, while analyses of DIF are statistical in nature: testing whether differences in probabilities remain when matched on trait level. Importantly, DIF is required, but not sufficient, for item bias. If no DIF is apparent, there is no item bias. If DIF is apparent, additional investigations are necessary: content analysis by subject matter experts.
DTF, DIF and item bias analyses contain a vernacular that is intertwined with those of Measurement Invariance via CFA/SEM.
Results
A. Pathogen Disgust
Before demonstrating the DTF and DIF effects, we plot the test and item functioning for each gender group. This is achieved by fitting what configural model which estimates all model parameters separately for each group. Below we plot for each gender the (a) expected test scores, (b) the test information, and (c) the individual expected item scores.
DIF
From the above graphs, the Pathogen items appear to be behaving differently for each group in the completely independent model. This is an indication that there are either population difference between the groups, or that the items are showing DIF, or both. In order to disentangle these, we conduct DIF analyses for the Pathogen Disgust items using the multiple-group GRM. All DIF analyses were conducted using the mirt package (Chalmers, 2012) with marginal ML estimation. To establish a set of potential anchor items, we used a multigroup GRM with no across-group equality constraints as a reference model. Next, likelihood-ratio tests were used to compare the reference model to models that added across-group equality constraints to the a parameter (discrimination/slopes) and d parameters (difficulty/intercepts) one item at a time. This is known as the bottom-up approach to DIF. We also used the top-down approach wherein we fit a fully constrained model (i.e., constrained model wherein all parameters are equal across groups) and we attempt to detect DIF with a slightly less restrictive model (constrained model with the focal distribution hyper parameters estimated). This means that in the slightest less constrained model we constrain the mean and variance of reference group to N(0,1) respective, and estimate mean and variance of the focal group (i.e., hyper parameters). In other words, we fit a completely equal model (i.e., ignoring group membership/constraining all parameters to be equal across groups) and fit an adjusted model that can account for group differences (where the mean and variance for the second group are freely estimated, but keep the intercepts/difficulty and slopes/discrimination equal across groups).
For these tests, we use the filtered data (N=914) as a mean to reduce noise, which could increase the false discovery rate.
In searching for DIF, results show that the first, second, third, fifth, and seventh items ([P] TDDS_3, [P] TDDS_6, [P] TDDS_9, [P] TDDS_15, [P] TDDS_21) were invariant across gender, but items 4 and 6 were not. In specific, item 6 found to not be invariant in both its discrimination/slope parameter \(\chi^{2}\)(1, N=914) = 7.158, p=.007, as well as its intercept/difficulty parameter \(\chi^{2}\)(6, N=914) = 51.39, p=.000. Item 4, on the other hand, only showed differential functioning on its intercept/difficulty parameter \(\chi^{2}\)(6, N=914) = 15.289, p=.018.
Measurement Invariance within SEM framework
When looking at measurement invariance within the SEM framework, one typically compares the fit of the model with specific parameters fixed to be equal across groups to the model with those parameters free to vary. If the free-to-vary condition fits significantly better than the fixed-to-be-equal condition, this parameter vector is not invariant across groups. The first level to check is the invariance of factor loadings, wherein the researcher wants to know whether the factor loadings are equal across groups: do the latent traits have the same loadings on the indicators that they are associated with, and are the relationships between latent traits equal across groups? This is a test of equality of structure coefficients. If factor loading invariance is established, the groups can be said to have the same unit of measurement. This is considered weak measurement invariance.
Similar to the results above, we find evidence the Pathogen subscale does not attain metric invariance \(\chi^{2}\)(7, N=914) = 16.423, p=.022. This is to say, when comparing the completely independent model with a model whose discrimination/slopes parameters are constrained to be equal (7 less parameters), the goodness-of-fit significantly decreases.
DTF
DTF is a consequence of DIF. Therefore, the items that were found to be invariant (all but 4 and 6) were constrained to be equal in the final multigroup IRT model wherein DTF is checked. In this model, the latent mean and variance of the reference group were constrained to be N(0,1) so these hyper parameters in the focal group (females) could be estimated. Results show that females had their latent mean and variance estimated to be 0.389 and 1.461, respectively. So, females have a larger mean and variance than males in the population. Furthermore, after controlling for population differences, we quantify the effect of differential functioning on Total Test scores. Results show DTF was not found to be present (\(\ P_{DTF = 0} = 0.31\)).
So, on the right, we plot the final model’s expected total score functions and their imputed confidence intervals are displayed for both gender groups. We see that there is a substantial amount overlap in the confidence regions for both males and females in their expected total scores at most levels of theta. On the left, we have the effect of DTF across the levels of \(\theta\).
Fit
As to assess overall fit the data, we would need to do this process for a Multidimensional IRT model (MIRT) because 7 items do no allow many dfs for fit indices on the current UIRT we are testing. However, we can assess item fit, which in this context is more valid and pertinent. Results indicate that only item 4 seems to be causing problems for females.
Item | S-X2 Statistic | df of S-X2 | p-value of S-X2 | adjusted p-value of S-X2 | Group |
---|---|---|---|---|---|
[P] TDDS_3 | 77.609 | 73 | 0.334 | 0.520 | male |
[P] TDDS_6 | 98.500 | 93 | 0.328 | 0.520 | male |
[P] TDDS_9 | 70.309 | 77 | 0.692 | 0.755 | male |
[P] TDDS_12 | 85.063 | 83 | 0.417 | 0.583 | male |
[P] TDDS_15 | 57.407 | 62 | 0.642 | 0.755 | male |
[P] TDDS_18 | 101.715 | 83 | 0.080 | 0.279 | male |
[P] TDDS_21 | 75.751 | 83 | 0.701 | 0.755 | male |
[P] TDDS_3 | 90.737 | 70 | 0.049 | 0.226 | female |
[P] TDDS_6 | 96.844 | 87 | 0.221 | 0.502 | female |
[P] TDDS_9 | 61.246 | 71 | 0.789 | 0.789 | female |
[P] TDDS_12 | 145.715 | 86 | 0.000 | 0.001 | female |
[P] TDDS_15 | 69.060 | 62 | 0.251 | 0.502 | female |
[P] TDDS_18 | 111.939 | 88 | 0.043 | 0.226 | female |
[P] TDDS_21 | 82.553 | 70 | 0.145 | 0.405 | female |
Results
B. Moral Disgust
As we have shown above, first we plot Moral Disgust test scores and information function along with each item’s expected scores. You find these plots below, wherein we plot a completely independent model while controlling for population differences in means and variances.
DIF
In proceeding with testing for DIF, we again conduct both bottom-up and top-down approaches to DIF, using the filtered data (N=914), while controlling for population differences.
In searching for DIF, results show that items 4 to 6 ([M] TDDS_10, [M] TDDS_13, [M] TDDS_16, [M] TDDS_19) were invariant across gender. Item 2 ([M] TDDS_4), showed differential functioning on its discrimination/slope parameter \(\chi^{2}\)(1, N=914) = 4.191, p=.041. Moreover, Item 3 ([M] TDDS_7) was found to be non-invariant in both difficulty/intercept \(\chi^{2}\)(6, N=914) = 15.782, p=.015 and discrimination/slope \(\chi^{2}\)(1, N=914) = 5.882, p=.015. Item 1 ([M] TDDS_1) was found to be non-invariant on its intercepts/difficulty parameters \(\chi^{2}\)(6, N=914) = 13.94, p=.003.
Measurement Invariance within SEM framework
When looking at measurement invariance within the SEM framework, we find evidence the Moral subscale attains metric invariance \(\chi^{2}\)(7, N=914) = 10.344, p=.17. This is to say, when comparing the completely independent model with a model whose discrimination/slopes parameters are constrained to be equal (7 less parameters), the goodness-of-fit does not decrease significantly.
DTF
DTF is a consequence of DIF. Therefore, the items that were found to be invariant were constrained to be equal in the final multigroup IRT model. Results show that females had their latent mean and variance estimated to be 0.193 and 1.341, respectively. This is similar to what was found above for the Pathogen sub-scale. Furthermore, after controlling for population differences, we quantify the effect of differential functioning on Total Test scores, and found again that DTF was not not present (\(\ P_{DTF = 0} = .99\)). Again, we plot the final model’s expected total score functions and their imputed confidence intervals and on the left, we have the effect of DTF across the levels of \(\theta\).
Fit
As the output below shows, all items seems to fit the data.
Results
C. Sexual Disgust
As we have shown above, first we plot Moral Disgust test scores and information function along with each item’s expected scores. You find these plots below, wherein we plot a completely independent model while controlling for population differences in means and variances.
DIF
In proceeding with testing for DIF, we again conduct both bottom-up and top-down approaches to DIF, using the filtered data (N=914), while controlling for population differences.
The Sexual sub-scale show strong DIF. In searching for DIF, results show that items 1, 3, 6 and 7 are non-invariant with respect to their discrimination/slope parameters while all items were found to be non-invariant w.r.t their difficulty/intercept parameters.
Measurement Invariance within SEM framework
When looking at measurement invariance within the SEM framework, we find evidence the Sexual subscale does not attain metric invariance \(\chi^{2}\)(7, N=914) = 20.531, p=.005.
DTF
Given the results from DIF, the final model is similar to the configural model, with exception that it constrains only the discrimination/slopes parameters of item 2, 4 and 5.
Results show that females had their latent mean and variance estimated to be 0.502 and 1.227, respectively. This is similar to what was found above for the Pathogen and Moral sub-scale, in the sense that women have larger means and variance.
Furthermore, after controlling for population differences, we quantify the effect of differential functioning on Total Test scores, and found again that DTF was present (\(\ P_{DTF = 0} < 0\)). We plot the final model’s expected total score functions and their imputed confidence intervals and on the left, we have the effect of DTF across the levels of \(\theta\).
Fit
While results indicate that items seem to fit data, the final model is practically a configural model (different parameters for each group).
First we look at the multidimensional IRT factor solution for the TDDS items. As we have seen before, all items load on their respective factors, with exception of item 11.
In IRT, discrimination parameters are directly related to the commonality (1 - uniqueness). Larger discrimination parameters indicate more of a correlation with the latent trait, which translate into larger commonalities and hence smaller uniqueness values. The difficulty parameter, however, is just an intercept and has little to do with uniqueness. High or low difficulty tells you nothing about the correlation that the item has with the latent trait.
F1 | F2 | F3 | Commonality (h2) | |
---|---|---|---|---|
[M] TDDS_1 | NA | NA | -0.7061 | 0.536 |
[S] TDDS_2 | NA | -0.5785 | NA | 0.523 |
[P] TDDS_3 | -0.6448 | NA | NA | 0.404 |
[M] TDDS_4 | NA | NA | -0.8347 | 0.728 |
[S] TDDS_5 | NA | -0.8249 | NA | 0.624 |
[P] TDDS_6 | -0.5245 | NA | NA | 0.301 |
[M] TDDS_7 | NA | NA | -0.8240 | 0.714 |
[S] TDDS_8 | NA | -0.9230 | NA | 0.815 |
[P] TDDS_9 | -0.5772 | NA | NA | 0.398 |
[M] TDDS_10 | NA | NA | -0.8412 | 0.707 |
[S] TDDS_11 | -0.3155 | -0.5046 | NA | 0.468 |
[P] TDDS_12 | -0.6820 | NA | NA | 0.424 |
[M] TDDS_13 | NA | NA | -0.9014 | 0.787 |
[S] TDDS_14 | NA | -0.6663 | NA | 0.547 |
[P] TDDS_15 | -0.7987 | NA | NA | 0.658 |
[M] TDDS_16 | NA | NA | -0.8085 | 0.672 |
[S] TDDS_17 | NA | -0.4101 | NA | 0.375 |
[P] TDDS_18 | -0.6116 | NA | NA | 0.422 |
[M] TDDS_19 | NA | NA | -0.9027 | 0.798 |
[S] TDDS_20 | NA | -0.7285 | NA | 0.567 |
[P] TDDS_21 | -0.6615 | NA | NA | 0.452 |
Second, we look at the item fit to the data. The output shows good fit for all items. Item fit is the most valid fit in this context of assessing an instrument’s properties.
Item | S-X2 Statistic | df of S-X2 | p-value of S-X2 | adjusted p-value of S-X2 |
---|---|---|---|---|
[M] TDDS_1 | 265.471 | 261 | 0.411 | 0.550 |
[S] TDDS_2 | 283.994 | 267 | 0.227 | 0.515 |
[P] TDDS_3 | 232.453 | 223 | 0.318 | 0.550 |
[M] TDDS_4 | 268.664 | 241 | 0.107 | 0.399 |
[S] TDDS_5 | 232.491 | 209 | 0.127 | 0.399 |
[P] TDDS_6 | 285.175 | 269 | 0.238 | 0.515 |
[M] TDDS_7 | 253.492 | 257 | 0.550 | 0.614 |
[S] TDDS_8 | 250.714 | 214 | 0.043 | 0.303 |
[P] TDDS_9 | 261.827 | 236 | 0.119 | 0.399 |
[M] TDDS_10 | 273.429 | 260 | 0.272 | 0.518 |
[S] TDDS_11 | 266.359 | 272 | 0.585 | 0.614 |
[P] TDDS_12 | 324.656 | 280 | 0.034 | 0.303 |
[M] TDDS_13 | 313.573 | 269 | 0.032 | 0.303 |
[S] TDDS_14 | 271.161 | 267 | 0.418 | 0.550 |
[P] TDDS_15 | 222.099 | 229 | 0.616 | 0.616 |
[M] TDDS_16 | 289.675 | 264 | 0.133 | 0.399 |
[S] TDDS_17 | 254.143 | 246 | 0.347 | 0.550 |
[P] TDDS_18 | 262.317 | 262 | 0.483 | 0.596 |
[M] TDDS_19 | 277.409 | 262 | 0.245 | 0.515 |
[S] TDDS_20 | 268.454 | 273 | 0.566 | 0.614 |
[P] TDDS_21 | 249.874 | 246 | 0.419 | 0.550 |
Third, we look at the overall goodness-of-fit for the model. We see that several indices indicate good fit. However, we see that the model has been ‘rejected’ (p-val=0). This is akin to getting a significant \(\chi^{2}\) in a CFA in lavaan, which is essentially boiler plate.
M2 | df | p | RMSEA | RMSEA 5% | RMSEA 95% | SRMSR | TLI | CFI | |
---|---|---|---|---|---|---|---|---|---|
stats | 143.2501 | 45 | 0 | 0.0489 | 0.04 | 0.058 | 0.0367 | 0.9208 | 0.9661 |
Fourth, we test for DIF (DTF cannot yet be assessed with M-IRT). We find that all items are flagged for DIF. [Personal opinion: it would be my guess that there must be something ‘wrong’ with the algorithm or method so I opened a ticket concerning this high rate of p-val=0]. However, when looking at item fit all items fit the data. I didn’t print the results for the sake of simplicity.
General Remarks
Multi-dimensional vs. Uni-dimensional IRT models
I believe there is only one aspect of these analyses that is dependent on your conceptualization of TDDS. And could have the potential to change a few aspects of it. That is, whether you consider the three domains of disgust as (a) separate components, that may or may not be correlated; or as (b) as three components of a disgust general domain (or factor). The difference lies on whether you conceive of TDDS as three components not underlied by the same latent construct (option a), or underlied by the same construct (option b). It would be great to know that also for the other two disgust instruments: PVD and DS-R.
Tested models and models specifications
Multi-dimensional vs. Uni-dimensional IRT models
For the results above, we estimated IRT models individually. That is to say, one IRT model, with its parameter estimates per sub-scale. Note that there is a substantive difference in estimating each sub-factor separately, and all factors together. The latter implies there the estimated parameters reflect a general factor of disgust, the former implies estimating the parameters of the sub-scale latent construct.
Leonard J Simms. Classical and Modern Methods of Psychological Scale Construction Article in Social and Personality Psychology Compass 2(1):414 - 433 ? November 2007. DOI: 10.1111/j.1751-9004.2007.00044.x