Psychometric Description of the True Compatibility Test™ --
A Proprietary Online Matchmaking System
Rense Lange
Illinois State
Board of Education
Integrated Knowledge
Systems
Ilona Jerabek
Plumeus, Inc. /
Queendom.com
James Houran
TRUE.com
Abstract.
Compatibility tests are the foundation of many online
matchmaking services, but psychometric support for their use is ambiguous or
unavailable. This paper summarizes the unprecedented application of Rasch
scaling to assess the structure and validity of an online proprietary
compatibility test, termed the TRUE Compatibility Test (TCT). Contrary to
previous efforts at quantifying long-term, romantic compatibility, the TCT
integrates both the principles of similarity and complementarity of partners’
characteristics. The theoretical rationale of the measured constructs,
technical quality and validity of the test are outlined. Furthermore, the
findings are discussed with respect to the conceptualization and measurement of
relationship quality. It is argued that models of assortative mating in terms
of gross similarity or complementarity are oversimplifications and that a
schema or “couple-centered” approach is a more valid predictor of relationship
satisfaction and stability. Specifically, it appears that men and women have
different definitions of relationship quality. Thus irrespective of the
necessary but unquantifiable element of “romantic chemistry,” our research suggests
that couples with satisfying and stable relationships are distinguished by
their ability to integrate qualitatively different issues into the relationship
via complex mental processes.
Introduction
The Internet has become a standard fixture in our society, with communication being one of its most popular uses. Through extended communication on the Internet, many users have formed relationships with others online. However, research has only recently begun to address the subject of online relationship development (see e.g., Bonebrake, 2002; Wolak et al., 2003). Kiesler and Kraut (1999) discussed how the nature of online relationships varies, but it is clear that flirting and dating (Whitty, 2003, 2004) and other forms of social networking (Ahuvia & Adelman, 1992) constitute an important aspect of the Internet phenomenon.
In light of these trends, it is not surprising that there is an apparent increase in the use and societal acceptance of so-called “compatibility tests” offered by online matchmaking services (Houran et al., 2004). Compatibility testing typically refers to a method of pairing unfamiliar people for long-term, romantic relationships based on the demographics, stated personal preferences, and personality profiling of individuals within a candidate pool. This type of testing is somewhat different, and arguably more difficult, than programs such as PREPARE and ENRICH that assess existing couples on the critical tasks related to early marital adjustment (see e.g., Fowers & Olson, 1986). Unfortunately, evidence for many advertised compatibility tests is either blatantly missing or lacking scientific standards (Houran, 2004, Houran et al., 2004; cf. Thompson et al., 2005). However, we are aware of two notable exceptions. First, Wilson and Cousins (2003b) published in a peer-review journal their Wilson Relationship Compatibility Indicator (WRCI) test, which yields “compatibility quotients” for couples. This test is the foundation of the matchmaking service and website of Cybersuitors.com. Wilson and Cousins have shown that heterosexual couples’ scores on this test show significant and positive correlation (average r = .31, p < .01) with scores on the Marital Adjustment Test (Locke & Wallace, 1959).
The WRCI is based on the principle of homogamy (similarity of partners’ characteristics), as opposed to the principle of complementarity (dissimilarity) of partners’ characteristics. As reviewed by Wilson and Cousins (2003a, 2003b), and recently echoed in new research by Luo and Klohnen (2005), cross-sectional and longitudinal research both suggest that similarity (“birds of a feather flock together”) is a better predictor of relationship quality than complementarity (“opposites attract”). It is important to note that this conclusion is also a gross oversimplification. The degree of similarity observed depends on the particular individual-difference domain studied, with romantic partners showing strong similarity in age, political, and religious attitudes; moderate similarity in education, general intelligence, and values; and little or no similarity in personality characteristics (for reviews, see Klohnen & Mendelson, 1998; Watson et al., 2004). The second example of an evidence-based compatibility test concerns the matchmaking service and website of TRUE.com (formerly TRUEBeginnings). TRUE commissioned the development of a comprehensive, online compatibility test designed to be broader in scope than the WRCI or the PREPARE marital preparation inventory via an integration of the mixed literature on similarity and complementarity. Furthermore, it was desired that the test be applicable to both heterosexual and same-sex matching. The resulting product was called the TRUE Compatibility Test™ (TCT).
The Present Paper
This report provides technical information concerning the
TCT – summarized from the comprehensive Technical
Manual for the TRUE Compatibility Test (TRUE & Jerabek, 2004). This
online Manual spans nearly 100 pages
and is available to any interested party for full scrutiny at: http://www.true.com/images/tctmanual.pdf?svw=footer.
Unlike the WRCI and other advertised compatibility tests, we show that the TCT
is the first compatibility test offline or online with reliability and validity
as stringently defined in the 1999 edition of the Standards for Educational
and Psychological Testing as issued jointly by the American Educational
Research Association, the American Psychological Association, and the National
Council on Measurement in Education (AERA, APA, NCME, 2002).[1]
As such, this paper is part of an active research program initiated by TRUE to encourage the sharing of evidence and findings among professional
and academic psychologists. Moreover, this report is intended to support the
perception of TRUE and its contractors that there is an
increasing need to enhance and maintain the status of online relationship
research and cyberpsychology in general.
Accordingly, in a
first section we provide a detailed theoretical rationale behind the
construction of the TCT. This section describes the psychological and
interpersonal variables that are assessed by this instrument, while providing
sample questions and references to the academic literature that formed the
basis for the construction of the TCT. Additional information regarding the
contents of the TCT can be found in Appendix A of the online Manual. A following section describes
the psychometric rationale for the TCT provided by Rasch scaling techniques.
This section includes an overview of the mathematical and statistical
considerations that are involved in “scoring” the
TCT. Finally, we present a section that details Rasch analyses of the TCT data
of 11,576 individuals of the seventeen major factors of the TCT.[2]
These analyses include items’ model fit, the results of bias tests, and
validity evidence. Throughout, pertinent technical results are listed in
the Appendices to the online Manual.
A final section discusses the theoretical import of the TCT’s development and
validation for refining issues in conceptualizing and measuring relationship
quality.
Although great effort was made to be complete as possible in describing the TCT, including its reliability, validity, matching algorithms, theoretical rationale, and scoring methods, the TCT is a proprietary product of TRUE.com. To protect this product, the text of most items is omitted, and the analyses refer to the TCT items by number only. Similarly, to protect the identity of the items and factors, the latter are identified only by a number – i.e., it is not possible to connect items and factors. Finally, “adaptive”[3] (i.e., answer- or group-specific) items that are not answered by all TCT test takers were omitted from all psychometric analyses. This left a total of 218 items for analysis.
Overview of the TCT
The TCT was
constructed by the second author’s online psychological testing firm, Plumeus,
Inc. (www.Queendom.com). Drawing on a thorough literature review, the test was
designed to address 99 key variables that determine each test-taker’s
long-term, romantic compatibility with potential partners. For some of the
variables, complementary or opposite matches are deemed better; in others, a
similar match is deemed more ideal (see e.g., Dryer & Horowitz, 1997; Klohnen & Luo, 2003). The goal of the TCT is to pair
people appropriately with potential partners across as many relationship
variables as possible. The test is arranged by top-level areas (factors), 13
second-level, more specific factors (subfactors), and 65 third-level, narrow
characteristics (subscales) that make up these factors.
The TCT
contains a pool of 616 items, some of which are core (administered to everyone)
and some that are adaptive (presented dynamically only if the test-taker is
inconsistent in his/her responses for the core questions in that factor).[4] On the basis of his/her responses, the
test-taker receives a Feedback Report
(a profile of his/her personality, habits, and attitudes, and how they can
affect his/her romantic relationships), an Ideal
Partner Report (a description of who the most complementary partner would
be for him or her), and a Compatibility
Report that reveals in detail to what degree potential partners are
compatible with the test-taker. The
test-taker also receives advice and tips that are tailored to his/her
particular issues.
Overall Structure
The top-level
factors are the largest, most general variables. These factors cover a spectrum
of areas relevant to relationships, ranging from conflicts to sex life and
communication. In addition,
The subfactors formed the initial building blocks of the TCT – the main elements that were thought to be most important in the development of the test. Like the subscales, all subfactors are based on and supported by empirical research. In particular, to meet the criteria for inclusion, there needed to be strong evidence of the importance of each issue in relationship satisfaction or relationship stability (longevity). To ensure face validity, they also needed to be relevant in a common-sense fashion. For an explanation of what the more targeted subscales measure, refer to Appendix A of the online Manual.
Although the TCT matching algorithm must
necessarily remain proprietary, we note that it uses responses to certain
specific question to match people with particular tastes. For instance, test-takers that indicated
that romantic people are unattractive to them are not paired with hopeless
romantics. Throughout, the matching process is governed mainly by the larger
factors, i.e., item combinations having the greatest reliability.
In general, the TCT matching algorithms use a
compatibility matrix that includes:
·
compatibility
levels of individual traits using similarity, dissimilarity or complementarity
algorithms depending on the issue
·
interactions
between specific traits
·
gender-specific
weighting of traits
·
relative
importance weighting of traits
·
bidirectional
algorithms for computing the final TRUE Compatibility Index TM (TCI).
The TCI is a metric akin to Wilson and Cousin’s (2003a, 2003b) “Compatibility
Quotient.”
Table 1 lists some illustrative questions used in the TCT. Additional details can be found in the text, and the remaining factors are presented in Appendix A of the online Manual.
|
|
Name |
Examples
of questions |
|
1) |
Comfort
with Vulnerability |
I find it ________ to say things like “I love
you”, and “I am happy I found you”. a)
Very
easy b)
Easy c)
Awkward d)
Difficult e)
Very
difficult I
give people the benefit of the doubt. a)
Completely
true b)
Sort
of true c)
Somewhat
true/somewhat false d)
Sort
of false e)
Completely
false |
|
2) |
Open-mindedness |
I appreciate the fact that my partner and I have
differences of opinion, because discussing them helps us grow as people. a)
Completely
agree b)
Mostly
agree c)
Somewhat
agree d)
Mostly
disagree e)
Completely
disagree I encourage my partner to work towards his/her
goals, even if I disagree with them. a)
Completely
agree b)
Mostly
agree c)
Somewhat
agree d)
Mostly
disagree e)
Completely
disagree |
|
3) |
Ability
to Communicate |
I
become absorbed in what I’m saying, and fail to notice if others are bored or
offended. a)
Almost
never b)
Rarely
c)
Sometimes
d)
Quite
often e)
Most
of the time I
try to be sensitive to the needs of others and anticipate their reactions to
my words and actions. a)
Always
true b)
Often
true c)
Sometimes
true d)
Rarely
true e)
Never
true |
|
4) |
Sexual
Prowess |
Sexual fantasies are natural. a)
Completely
agree b)
Mostly
agree c)
Somewhat
agree d)
Mostly
disagree e)
Completely
disagree Ideally,
how often would you like to have sex? a)
At
least once a day. b)
Two
to three times a week c)
Once
a week d)
Two
to three times a month Once a month or less |
|
5) |
Interaction
with Others |
You
are one of three equally deserving employees eligible for a big promotion at
work. How likely are you to think
that you will be the one chosen for the promotion? a)
Completely
unlikely b)
Unlikely c)
Somewhat
likely/somewhat unlikely d)
Likely e)
Highly
likely Spending
time with others wears me out. a)
Completely
untrue b)
Mostly
untrue c)
Somewhat
true/somewhat false d)
Mostly
true e)
Completely
true |
|
6) |
Social
Network |
I am very close with my family. a)
Completely
true b)
Mostly
true c)
Somewhat
true d)
Mostly
untrue e)
Completely
untrue Without my loved ones, I would be lost. a)
Completely
true b)
Mostly
true c)
Somewhat
true d)
Mostly
untrue e)
Completely
untrue |
|
7) |
Conscientiousness |
When
I commit to doing something: a)
I
forget about it. b)
I
do it when/if I get around to it. c)
I
get it done, unless something more important comes up. d)
I
get it done. When
it comes to orderliness, I’m: a)
A
complete slob. b)
A
bit of a slob. c)
Average. d)
A
bit of a neat freak. e)
An
utter neat freak. |
|
8) |
Integrity |
My
friends would tell you: a)
I’m
honest to a fault. b)
I’m
generally honest. c)
I
sometimes stretch the truth. d)
Not
to believe a word I say If
I am running late for an appointment, I: a)
Contact
the person/people I’m meeting to let them know. b)
Rush
but don’t call. c)
Get
there when I get there. |
|
9) |
Adventurousness |
I enjoy going with the flow and being playful. a)
Completely
true b)
Mostly
true c)
Somewhat
true d)
Mostly
untrue e)
Completely
untrue I am a creature of habit. a)
Completely
true b)
Mostly
true c)
Somewhat
true d)
Mostly
untrue e)
Completely
untrue |
|
10) |
Rigidity |
I
have a tendency to resist changing how I am used to doing things. a)
Completely
agree b)
Mostly
agree c)
Somewhat
agree d)
Mostly
disagree e)
Completely
disagree You
and your partner have a strong difference of opinion. Are you willing to let it go? a)
Yes,
I’ll drop it. b)
It
depends on how important the issue is. c)
No,
I’ll try to convince him/her to accept my point of view. d)
No,
I’ll insist that my partner accept my point of view. |
|
11) |
Dominance |
Even when I’m quite upset with my partner, it’s hard for me to
bring it up. a)
Most
of the time b)
Often c)
Sometimes d)
Rarely e)
Never I make sure that my partner hears my point of
view. a)
Most
of the time b)
Often c)
Sometimes d)
Rarely e)
Never |
|
12) |
Healthy
Attachment |
I get more attached to others than they seem to
get to me. a)
Completely
true b)
Mostly
true c)
Somewhat
true d)
Mostly
untrue e)
Completely
untrue I
need ______ reassurance from my partner about his/her feelings for me. a)
Constant b)
Frequent c)
Occasional d)
Rare e)
No |
|
13) |
Psychological
Strength |
People
tell me that my moods are unpredictable. a)
Quite
often b)
Often c)
Sometimes d)
Rarely e)
Never When
I’m presented with a problem, I’m able to develop an effective solution. a)
Almost
never b)
Rarely
c)
Sometimes
d)
Quite
often e)
Most
of the time |
Relationship
Variables
Communication Style. Several studies have shown that effective and compatible communication style is one of the pillars of relationships. Couples with ineffective or unconstructive communication are more likely to report relationship dissatisfaction and distress (Christensen & Shenk, 1991; Rogge & Bradbury, 1999). Having compatible communication skills improves a couple's chance at happiness. Many potential stumbling blocks in relationships can be overcome by communication; it is the greatest key to intimacy. In fact, lack of emotional closeness and feelings of alienation are the best predictors of depression in both men and women (Heim & Snyder, 1991). Reported relationship quality has been shown to be influenced by positive communication behaviors, such as spousal support, companionship, intimacy and friendship (Jerabek, 2003; Julien et al., 2003; Pasch et al., 1997; Pasch & Bradbury, 1998; Prager, 1995).
An important aspect of the Communication Style factor is open-mindedness, which is comprised of tolerance of mood instability, tolerance for differences in opinion, tolerance for goal differences and need for control. This construct measures how amenable the test-taker is to differing viewpoints, along with how willing s/he is to relinquish control. According to a study by Shackelford and Buss (1997), lack of openness in both men and women results in lower esteem for them on the part of their spouses.
In addition, self-disclosure has been shown to be a good predictor of relationship satisfaction, in both men and women (Hendrick et al., 1988). Self-disclosure requires comfort with vulnerability, which is based on Bowlby’s defensively separate construct in attachment theory (Bowlby, 1969, 1973). According to his theory, the defensively separate have a harder time becoming closer to others. Their relationships are characterized by less overall satisfaction, not to mention lower quality – they lack trust, and they experience more unpleasant emotions than positive ones (Meyer & Pilkonis, 2001; Simpson, 1990).
However, the amount of communication people desire in a romantic relationship differs greatly, both within a couple and between couples. Jerabek (2003) has demonstrated that the degree to which a relationship meets the person’s need to communicate is strongly correlated with self-reported relationship satisfaction. The TCT takes this issue into account and matches partners based on their preferences for connectedness in a relationship.
Accordingly, the Communication Style factor
encompasses all the above-mentioned factors, including ideas about how much
communication is needed in a relationship, which issues are worth discussing,
and to what extent emotions should be shared.
Other factors that contribute to overall communication style include
level of comfort with displays of emotion, self-disclosure, need for intimacy,
comfort with expressing and witnessing emotions, and willingness to be
vulnerable in a romantic relationship. Also included are the need for
intellectual discussions, general communication skills, sensitivity and tact,
open-mindedness in terms of the communication process, and tolerance for
differences in opinion and goals (Meyer & Pilkonis, 2001; Rogge &
Bradbury, 1999; Simpson, 1990).
Conflict Resolution. The ways that individuals approach and resolve conflict situations can greatly influence their likelihood for establishing successful relationships. Conflict resolution skills are essential to overall relationship satisfaction, and working on developing skills in this area can produce a significant improvement in the couple’s happiness (Jerabek, 2003; Markman et al., 1993). By the same token, problematic conflict resolution style (competitive, dominating, passive-aggressive, withdrawal or submissive) in one or both partners leads to marital distress (Goeke-Morey et al., 2003; Gottman et al., 1998; Gottman & Krokoff, 1989; Kurdek, 1993, 1996; Kurdek & Schmitt, 1986).
Shackelford and Buss (1997) showed that when couples
experience conflict in a variety of areas (specifically affection and
attention, jealousy, finances, sex, chores and control and dominance), spouses
feel less esteem for one another. As
conflicts are inevitable in long-term relationships, the ability to
negotiate solutions to a variety of issues in a non-threatening way is
essential for the very survival of the romantic bond (Bradbury, 1998).
Moreover, parental conflict has a negative impact on children’s
adjustment and can lead to their maladaptive behaviors (Cummings & Davies,
1994; Grych et al., 2000), and this is especially true for girls (Davies &
Lindsay, 2004). There are many
different ways of dealing with conflict, some more productive than others –
whether or not respondents have what it takes to resolve conflict is essential
to consider. In addition, some
individuals are more likely to get into conflicts than others; hence, this is
important to consider when pairing subjects.
Therefore, in addition to respondents’ conflict style,
the TCT matching algorithms further take into account their proneness to
conflict and willingness to resolve it, while assessing the impact
of the interactions of these factors.
Sex Life. While sexual satisfaction is far from being the single most important factor in relationship satisfaction (according to a Queendom.com poll, only 35% of respondents claim that great sex life is absolutely necessary), sexual intimacy is important to a certain degree in most couples (Jerabek, 2003; Kelly & Conley, 1987; Perrone & Worthington, 2001). Intimacy brings partners closer, and allows them to express their love for one another. However, people have different attitudes, experiences, and preferences related to sex, all of which can contribute to discord if either party is unwilling or unable to compromise.
Sexual attitudes and behaviors are fundamental to both sexual compatibility and feeling fulfilled in this arena (Kelly & Conley, 1987). Since expectations and desires with respect to frequency of sexual encounters vary widely across all age groups, the test-taker’s libido is assessed and matched with potential partners. Moreover, attitudes about what is acceptable sexual behavior and willingness to experiment have been shown substantial variability throughout the life span, within age groups, in both genders and within all sexual orientation categories (Queendom.com, 1999, 2000, 2001).
Kelly and Conley (1987) reported that sexual history influenced marital outcome in their study. Terman (1938) and Burgess and Wallin (1953) have shown that high levels of premarital sexual activity are associated with marital instability in men. In addition, sexual faithfulness helps build a sense of trust between partners, and by the same token, unfaithfulness can significantly tarnish it.
To ensure compatibility in all these areas, the Sex
Life factor covers libido and desired frequency of sexual encounters,
faithfulness, sexual experience and history, sexual attitudes, and sexual
behavior.
Social Life. The amount of time a person likes to spend socializing, with whom they prefer to socialize, and their chosen social activities are all important when it comes to choosing a satisfactory partner (Asendorpf & Wilpers, 1998; Jerabek, 2003; Shackelford & Buss, 1997).
Social skills are important for relationship success, romantic and otherwise. Lack of people skills in one partner can lead to awkwardness and uncomfortable social situations. In fact, Shackelford and Buss (1997) demonstrated that both men and women who are married to disagreeable partners have less esteem for their partners. In more general terms, studies have shown that agreeableness prevents conflict with opposite-sex peers (Asendorpf & Wilpers, 1998; Graziano et al., 1996). Similarly, negative and pessimistic attitudes and approach to life can be very taxing for couples (Gottman et al., 1998; Julien et al., 2003; Pasch et al., 1997; Schulz et al., 2004).
The amount of couple and individual social life outside of the relationship is an important consideration. Interpersonal differences in this area can be great, causing clashes between couples about how much time to spend on their own, by themselves, and socially with others. Disagreement about how often and with whom the partners should or should not socialize can lead to recurring conflicts, jealousy, resentment, pent-up frustration, and feelings of abandonment, rejection and injustice. Jerabek (2003) found a strong and positive correlation between satisfaction with couples’ social life and self-reported relationship satisfaction. Extroversion is an equally important consideration as this trait influences how much time one wants to spend in the company of others. An introvert’s need for time for oneself can clash with an extrovert’s need for company, leading to feelings of suffocation in one and rejection in the other. In addition, Asendorpf and Wilpers (1998) reported that sociability predicts falling in love, while shyness prevents it, mainly by limiting the shy person’s exposure to potential partners.
Social support from sources other than romantic partner is an important factor in a model predicting marital satisfaction (Perrone & Worthington, 2001). In addition, relying solely on one’s romantic partner for fulfillment of all emotional needs places a lot of pressure on this partner and can lead to unhealthy attachment and co-dependent behavior – all of which in turn result in relationship distress (Jerabek, 2000).
In sum, the Social Life factor includes factors such
as extroversion, social skills, agreeableness, positive attitude, sense of
humor, selflessness, support network of friends and family, and desire for
friendships outside of the relationship (both couple and individual).
Personal Characteristics. Questions about one’s identity, how one interacts with people, and the way of looking at the world are all important factors to take into consideration when it comes to finding a mate. A suitable partner will truly complement the other’s personality, and have a similar outlook on life. Couples with similar personalities tend to have more satisfying relationships (Robins et al., 2000). The personal characteristics factor measures a variety of personality factors, along with attitudes about a variety of issues.
Conscientiousness: The Conscientiousness subfactor includes the following subscales: adherence to routine, self-discipline, organization/planning, orderliness, and dependability / reliability. Conscientiousness plays out in every area of relationships. People who are conscientious are likely to be frustrated by a lower level of conscientiousness in their romantic partners. Also, high conscientiousness in men is related to increased esteem towards them by their wives (Shackelford & Buss, 1997).
Integrity: The integrity subfactor measures overall honesty of the subject, along with whether their behavior is trustworthy. It consists of dependability/reliability, loyalty, and honesty. Honesty and trustworthiness are two extremely important attributes that people desire in romantic partners (Fletcher et al., 1999). Establishing and maintaining trust is essential for continued commitment in a relationship (Wieselquist et al., 1999).
Adventurousness: Adventurousness is another trait that can have a negative impact on a mismatched couple. People scoring low on this trait are reluctant to try new things; they tend to be sedentary and prefer a routine. Highly adventurous people, on the other hand, are ready to take off at the drop of a hat, love to experience new things, thrive on change and resent routine. Research has shown that sharing novel and exciting activities prevents boredom and stagnancy in a relationship. Being adventurous together is a good thing for a relationship (Aron et al., 2000); however, pairing two people far apart on adventurousness can cause serious personality clashes in the relationship. The Adventurousness subfactor includes the following subscales: flexibility, open-mindedness, energy level and zest, spontaneity, and adherence to routine.
Rigidity. The rigidity construct is assessed by the following subscales: need for control, standards (other-oriented perfectionism), flexibility, open-mindedness, and adherence to routine. It was included because being able to adjust in order to achieve compromise with a partner is essential to relationships, and this ability is nearly absent in people high in rigidity. According to Weiselquist et al. (1999), pro-relationship acts, such as sacrificing needs and preferences and making accommodations for a partner, helps build relationship trust.
Dominance: The dominance subscale consists of assertiveness, tolerance for differences in opinion, tolerance for goal differences – support, and need for control. A number of studies have demonstrated that the complementarity theory holds for this factor - people who are generally dominant tend to get along better with people who are more submissive, and vice versa (Dryer & Horowitz, 1997). However, dominant / submissive couples are at risk of developing co-dependency problems and the submissive partners may tend to fail to achieve successful differentiation of their selves, which is fundamental to long-term intimacy (Bowen, 1978; Guerin et al., 1996; Kerr, 1985; Skowron, 2000; Titelman, 1998).
Attitudes and worldview: This complex subfactor includes assessment of a variety of issues that frequently cause problems in romantic relationships.
· Gender roles: The TCT includes an assessment of the test-taker’s beliefs and attitudes regarding this potentially explosive issue. Perceived equality between the partners factors into relationship satisfaction (Perrone & Worthington, 2001). One’s gender role attitudes are reflected in numerous areas, but all of them essentially boil down to power balance in the relationship, from decision making, child care, chores and errands to division of financial resources.
· Money attitudes: This subfactor measures the participant’s approach and attitude towards money. Arguments about whether to spend versus save money can be a big stumbling block. If one partner is unable to curb his or her spending habits, the other partner may be resentful. Importance of money is also included in this measure; how driven one is, how many hours one is willing to work, and how much one desires spending money on the trappings of wealth are all affected by the importance one places on money. Conflicts about money can be a major indicator of problems in marriage; couples that argue about money feel much less esteem for one another than couples that do not argue about money (Shackelford & Buss, 1997).
· Political and social attitudes: While standing on the opposite sides of left / right spectrum does not necessarily prevent partners from forming a successful bond, these attitudes, especially in their extreme form, and even more so if combined with intolerance, can cause major opinion clashes in the couple.
· Parenting style: Similar strategies and opinions about parenting are important when a couple decides to raise a child together. Parenting styles – measured by attitudes, approaches, amount of affection shown to the children and level of discipline used – are varied, and must be taken into consideration. When it comes to parenting, being a team when it comes to parenting is imperative, as inconsistencies will typically lead to adjustment problems and maladaptive behavior in children (Brody et al. 2003; Kim et al., 2003; Ruiz et al., 2002). Style differences can also cause difficulties in marital relations, specifically in terms of couple intimacy (O’Brien & Peyton, 2002).
· Relationship attitudes and dating philosophy: There are some issues that can actually make or break a relationship, such as differing levels of readiness to commit, different relationship values, and lack of consensus about whether the relationship will end in marriage or is just a short fling. Each individual enters into relationships with their own preferences and expectations. If fundamental differences do exist, a couple may find that they have to either compromise on important issues or look elsewhere (Gray-Little et al., 1996).
· Romantic attitudes: While some might argue that being romantic, attentive or chivalrous have little to do with long-lasting relationship satisfaction, the fact is that these attitudes are not just skin-deep. Surely, romantic gestures and passionate attentiveness are more frequently observed in the early states of courtship. Nevertheless, there is no denying that remembering anniversaries, breakfast in bed and little affectionate gifts can contribute to a lasting romance, maintenance of passion and feeling of being loved, which in turn have a major impact on relationship satisfaction (Bradbury et al., 2000; Jerabek, 2003).
Attachment Style. Some people prefer complete independence from their partner, while others rely on their partner for almost everything, from self-worth to personal identity to decision making. There are those who need independence, and others who prefer to be attached at the hip most of the time. It is crucial to know what a potential partner’s relationship style is like from the very beginning. Discrepancies in terms of attachment needs can lead to disappointment and conflict in the relationship (Christensen & Shenk, 1991; Simpson, 1990).
One
aspect of a problematic attachment style is dependency, an inability to
differentiate one’s self from partner.
Differentiation of self
in intense emotional bonds is essential for development of a healthy
relationship. It allows for greater
role flexibility and deeper intimate contact.
Partners who maintain their sense of self can tolerate differences of
opinion and are less emotionally reactive (Bowen, 1978; Kerr & Bowen,
1988). Conversely, partners in poorly differentiated marriages are less
emotionally mature, have a limited capacity for closeness and separateness, and
tend to sacrifice self-development and sense of personal identity to maintain
stability in the relationship (Bowen, 1978; Kerr & Bowen, 1988; Schnarch,
1997).
The TCT also assesses other issues that are related to attachment style and dependency problems, such as security in a relationship and jealousy. Fear of rejection and abandonment are commonplace in couples with attachment problems. Studies on sensitivity to rejection demonstrate that people who anxiously expect rejection tend to readily perceive its presence in ambiguous or insensitive behavior of others (Downey & Feldman, 1996). Research by Downey and Feldman (1996) shows that rejection-sensitive people and their partners experience dissatisfaction with their relationship.
In addition, the TCT includes several subscales that assess need for personal space (i.e. having a life separate from partner), need for privacy (i.e. understanding and respecting each other’s domain), expectations in terms of the amount of social life of the couple (couple friendships) and need to maintain one’s individual friendships (socializing with others without the partner present). People with problematic attachment styles frequently hold dysfunctional relationship cognitions, which are linked to relationship dissatisfaction (Baucom & Epstein, 1990; Fincham et al., 1990; Kurdek, 1992). Unrealistic expectations and idealistic assumptions about how relationships should work set the stage for disappointment and a sense of failure when things don’t go as smoothly as one may wish. For example, people with dysfunctional relationship cognitions think that successful couples should never have any disagreements, should want to spend all their free time together or should never be attracted to another person. They feel that people who love each other should not have any secrets, should not need any personal space, should share anything and everything and should not need any friends other than their partner. In other words, they want to be “one body, one soul.” Once they realize that this is not the case in their relationshp, they may panic and overreact to minor problems.
Stress Reaction. A relationship has the potential to be a great source of support in stressful time – yet, for those people that deal poorly with stress the potential deterioration of the relationship can only add more stress. The relationships of couples that have less productive reactions to stressful life events may suffer when such events occur (Cohan & Bradbury, 1997). In addition, negative stressful events during workday contribute to angry marital behavior in women and withdrawal in men (Schulz et al., 2004) and in general to negative marital interactions (Crouter et al., 1989; Gottman & Levenson, 1988). According to Larson and Richards (1994), minor daily stressors, such as chores, childcare, and errands have a major effect on the emotional lives of the partners and the nature of family relationship. While some authors argue that gender differences are somewhat overrated when it comes to dealing with stress (Aries, 1996; Brody, 1999), several studies have demonstrated that men tend to use withdrawal (both emotional and behavioral) as a coping mechanism, women are more likely to be critical, verbally confront their partners, and initiate conflict (Christensen & Heavey, 1990; Gottman & Levenson, 1988). Brody (1999) argued that marriage is one context in which women are more likely to express more anger than men. These gender differences appear to be more pronounced under stress.
Since reaction to stress is highly subjective and depends on the person’s coping skills, sense of self-efficacy and ability to deal with adversity on an emotional level, the TCT includes number of factors assessing these characteristics. In a study by Waldinger et al. (2004), the ability to correctly read emotions was linked with concurrent marital satisfaction as well as interviewer’s assessment of long-term relationship stability and adjustment. It has also been demonstrated that emotional intelligence has a profound impact on one’s level of functioning, social success and happiness in general (Jerabek, 1999). Therefore, the TCT includes an assessment of emotional intelligence.
The Psychological Strength subfactor includes
several aspects that address these issues: security in a relationship,
dependency, need for control, self-esteem and self-confidence, mood stability
(including anxiety, depression, anger control and moodiness), optimism and
positive attitude. This construct is
most similar to the “big five” notion of neuroticism. There is a consensus in the literature that neuroticism is a
negative predictor of marital satisfaction (Kelly & Conley, 1987;
Shackelford & Buss, 1997).
There is no doubt that dealing with unstable emotions
in a partner is difficult, often leading to marital problems. However, the relationship between depression
and marital distress is bi-directional.
For example, 50% of women who are experiencing relationship problems
report significant depressive symptoms (Weissman, 1987). In fact, studies have shown that emotional
distance and alienation predict depression for both sexes (Cano & O’Leary,
2000; Heim & Snyder, 1991). In
addition, neuroticism in one of the partners has been shown to be one of the
best predictors of marital distress and dissolution of the couple (Kurdek,
1997).
Negative emotional behavior (e.g., expressed anger,
sadness, or contempt) has also been shown to differentiate satisfied from
dissatisfied couples (Schaap et al., 1988).
Likwise, self-esteem has been shown to be a good predictor of
relationship satisfaction, especially in men (Bailey et al., 1987; Hendrick
& Hendrick, 1988).
Basic
Measurement Issues: Technical Quality of the TCT
Rasch Scaling
The Rasch model is increasingly used for other purposes as
well (for an overview, see, e.g., Bond & Fox, 2001), including applications
in clinical psychology (McCutcheon et al., 2002; Lange, Thalbourne et al.,
2000), psychiatry (Lange, Greyson et al., 2004; Lange, Thalbourne et al.,
2002), medicine (Lange, Donathan et al., 2002; Lange & Hughes, 2004), and
artificial intelligence (Lange, Greiff et al., 2004).
The major differences between successful Rasch modeling and the classical scaling approaches can be summarized by four “rules” (Embretson, 1999, p. 12, cf., Embretson, 1995):
1. The standard error of measurement differs between persons with different response patterns but generalizes across populations.
2. Shorter tests can be more reliable than longer tests.
3. Comparing tests forms across multiple forms is optimal when test difficulty levels vary across persons.
4. Unbiased estimates of item properties may be obtained from unrepresentative samples.
In other
words, the classical notion that all test scores are equally reliable is
abandoned in favor of local (i.e., level-specific) standard errors of estimate
(SE) – no longer is there a single index of score reliability. Also, longer
tests are not necessarily “better,” as – depending on the distribution of
respondents’ trait levels – many questions are guaranteed to be redundant.
Thus, by using items that best address respondents’ different trait levels
(i.e., by purposely using non-parallel forms) greater measurement
precision is obtained. In the extreme, items can be selected specifically
to optimize reliability (or, equivalently, minimize SE). When this is done in
an interactive computerized fashion, one speaks of Computer Adaptive Testing,
or CAT (see e.g., Wainer, 2000).
Basics. The Rasch scaling of binary (i.e., dichotomous) items assumes the form of a logistic regression model where each person and item is individually parameterized to derive the log odds of the probability P of observing an answer indicative of the trait under consideration. For binary items (i) and persons (n):
. (1)
In the above, Pni reflects the
probability that person n will answer item i affirmatively, where
person n has trait level Bn and item i reflects
the trait amount Di. Note that the item and person parameters
share a common metric as defined by the left-hand side of Equation 1 – i.e.,
the log-odds of the probability Pni. Accordingly, all
quantities in the Rasch model are said to be expressed in logits.
Equation 1 shows that the Rasch model is additive in the
parameters (Bn) and (-Di). Thus, in
contrast to related models such as the two- and three-parameter logistic (cf.,
Fisher, 1995), the Rasch model meets the first requirement for interval
measurement – i.e., additivity (Michel, 1990).[5]
It follows from Fisher's principle of statistical sufficiency (see Wright &
Stone, 1979) that the maximum-likelihood of the parameter estimate for each
parameter occurs when the expected raw score corresponding to the parameter
estimate equals the observed raw score. Accordingly, raw scores are sufficient
statistics for the parameters B and D – indeed, these quantities
can be estimated independently of each other.
Rating Scales. The Rasch model has been extended to rating scales (Andrich, 1978) and partial-credit observations (Masters, 1982) for polytomous items, i.e., response formats where respondents select from two or more presumably ordered response categories. The rating scale and partial credit formulations both introduce “step” values {Fk} representing the boundaries between two adjacent rating categories k and k-1. To be precise, each Fk reflects the point at which the choices of categories k and k-1 are modeled to occur with equally probability. However, they differ with respect to the assumptions made concerning the item-dependency of the step values. In particular, the rating scale model assumes that the {Fk} are the same for all items under consideration, whereas the partial credit model allows the {Fk} to vary across items (see, e.g., Wright & Masters, 1982).

The TCT as described in the online Manual uses a hybrid of these two formulations that allows items’ step values to vary across different sub-groups (g) of items. In other words, it is assumed that items share the same step values within a particular sub-group, but these values are allowed to differ from the step values for other item sets. Accordingly, the group specific step values will be denoted as {Fgk}. Like the item and person parameters, the step values are additive, thus yielding the hybrid model:
(2)
In the above:
·
Pnik is the probability of observing category k for
person n encountering item i.
·
Pni(k-1) is the probability
of observing category k-1
·
Fgk is the difficulty of being observed in category k
relative to category k-1, for an item in group g.
Solving for Pnik in Equation 2 (not shown, see, e.g., Wright & Masters, 1982) yields an explicit equation whose plot serves to illustrate the quantities defined above. For instance, Figure 1 above shows the Pnik (Y-axis) for –5 < Bn < 8, with Di = 1 and Fg1 = -2, Fg2 = -1, and Fg3 = 3. In this figure, Fg1 and Fg2 are shown at –1 and 0, respectively, as their values (i.e., -2, and –1) are relative to the item’s overall location (B, X-axis) – in this case, 1. Going from left to right, the curves in this figure reflect the probability of observing a particular rating 0, 1, 2, 3, given B. The reader can verify that the {Fgk} are located at the point where the probability of finding a response in two adjacent categories is identical (i.e., at the intersections of the curves). Thus, the {Fgk} reflect the categories’ interior boundaries.
It is noted that the rating-scale and the
partial-credit formulations are both special cases of Equation 2. The former
obtains when all items are in the same group, and the latter obtains when each
item defines its own separate group. Also, Equation 1 for binary items obtains
when rating scales with just two categories are used.
Further Generalization. Linacre (1989) generalized Equations 1 and 2 to a Many-Facet Rasch Model by allowing the left-hand side to be affected by the effects of additional independent variables (or, “Facets”) as well. In the simplest case respondents’ trait levels can be thought of as being affected by a single variable C (e.g., respondents’ age or gender) with levels j:
(3)
Note
that the additive properties of the model are maintained. In particular, raw
scores are sufficient statistics for the Bn, Di
and Cj, and the counts of observations in each category are
sufficient statistics for estimating the {Fk}.
Model Fit. All Rasch formulations support powerful quality-control fit
statistics for assessing the conformance of the data to the model (see Wright
& Stone, 1979). Practice indicates that the model is robust against many
forms of misfit, and typical perturbations in data tend to have little
influence on the measure estimates. Thus, while a few misfitting items may
introduce noise, the quality of measurement provided by the other items is
thereby little affected. A further feature of the data is its robustness
against missing data. Since the model is parameterized at the individual
observation level, estimates are obtained only from the data that has been
observed (assuming that “missing” is not in fact a response option). There is
no need to impute missing data, or to assume a particular form of the
distribution of parameters. Of course, missing data decrease the precision with
which parameters can be estimated.
In estimating the measures, the model acts as though the randomness in the data
is well behaved in accordance to the particular Rasch model being used. This is
not a blind assumption, however, because the quality control fit statistics can
be computed to report where, and to what extent, this requirement has not been
exactly met. For instance, for each response to item i by person n,
a standardized residual zni can be computed as the
difference between an observed datum and the probability estimate P of
its occurrence (e.g., as derived via Equations 1, 2, or 3) after
division by its standard deviation. Since such zs are
approximately normally distributed, unexpected results (e.g., observations with
|z| > 3) are easily identified.
The preceding forms the basis for computing the overall fit of the questions across respondents as quantified by their Outfit. For instance, the Outfit of item i over respondents n is:
(4)
Since the summed z2 in Equation 4 define an approximate χ2 statistic with expected value n – 1, the Outfit statistic ranges from 0 to ∞, with an expected value of 1. Additionally, items’ Infit can be computed by weighting the terms in Equation 4 by the difference between the item and person locations (see Wright & Masters, 1982). Thus, the items’ Outfit is sensitive to deviations across the entire range of the latent Rasch variable, whereas their Infit mainly reflects localized inconsistencies.
Although the ideal Infit and Outfit values are 1, consistent with prevailing practice (see e.g., Bond, & Fox, 2001) values in the range 0.6 to 1.4 will be considered acceptable. Note that fit values exceeding 1 indicate the presence of unmodeled variation (i.e., the data are too noisy), whereas values smaller than 1 reflect the absence of modeled noise (i.e., the responses show greater determinism than is entailed by the model). The former is a more serious threat to model fit than the latter.
Differential Item and Test Functioning. Embretson’s (1999, p. 12, emphasis added) statement that “unbiased estimates of item properties may be obtained from unrepresentative samples” implies that items locations B should be invariant across sub-populations of the respondents. Recomputing the item locations Bi in samples from this population can check this assumption. When such checks reveal that the items’ locations systematically differ across sub-groups, we say that these items show Differential Item Function, or DIF. In the present context, age and gender are of particular interest because Lange, Houran et al. (2004) found that these variables yielded statistically significant DIF effects in a relationship-related context.
The finding of DIF
threatens construct validity since this implies that different sub-groups
assign different semantics to the underlying variable (for a discussion, see
Lange et al., 2001). The presence of DIF does not imply however that the
measurement of the latent variable is thereby seriously compromised – i.e.,
there need not be Differential Test Functioning (DTF). In particular, DIF
in some items may cancel that in others, thereby having little or no effect on
the estimated person parameters (for examples see e.g., Lange, Irwin et al.,
2000; McCutcheon et al., 2002). Unfortunately, DIF cancellation – and
hence the absence of DTF - is by no means guaranteed (cf., Lange,
Thalbourne et al., 2002; Lange, Houran et al., 2004) and should thus not be
taken for granted.
An effective means to
establish the absence of DTF is to determine whether the raw-score to
Rasch (R-to-R) measure conversions differ by more than these measures
standard errors of measurement. In the present research this is done
graphically by (a) plotting the R-to-R translation, together with the
local SEB (i.e., B + SEB) and
then (b) checking whether the sub-group specific R-to-R fall inside this
interval, except perhaps for the most extreme measures.[6]
If so, it has been established that sub-groups’ estimated measures show no
meaningful variation.
In the present context, we
focus on DTF related to respondents’ age and their own versus preferred
partner gender.
·
Age DTF is
assessed by comparing the R-to-R transformation for younger (age < 35
years) vs. older (age > 35 years) respondents.[7]
·
All four own vs. preferred
partner gender groups are considered. Thus, denoting male as M and
female as F, the four gender preference groups are FF, FM, MM, MF.
Parameter
Estimation. The parameters of the Rasch
models used here will be estimated using the versatile Winsteps software
(Linacre, 2004). This produce estimates of all model parameters in Equations 1
through 3 Joint Maximum Likelihood Estimation (JMLE) procedures. These
procedures are sufficiently efficient to analyze thousands of respondents and
items simultaneously, while allowing group-specific rating scale
parameterizations of the items. Winsteps also computes the item-total
correlations and the frequency of the ratings obtained for each item, as well
as the Infit and Outfit statistics discussed above.
Dimensionality.
A basic assumption underlying all of the preceding is that the items under
consideration define a single latent dimension. Unfortunately, it has long been
known (cf. Comrey, 1978; Panter et al., 1997) that standard item-level
factor analysis is inconclusive to establish unidimensionality (or
multi-dimensionality, for that matter).[8]
To make matters worse, it can be derived from statistical theory (cf. Stout,
1987, 2002) how multidimensionality may result from DIF – a finding that was
confirmed by computer simulations (Lange, Irwin et al., 2000).
The approach followed here to investigate items’
dimensionality is to analyze their residuals (see Equation 4) using
principal-component analysis because this addresses multidimensionality and DIF
simultaneously (cf. Linacre, 2004). The Winsteps software referred to above
incorporates such factor analyses as well.
Reliability. Within classical test theory "The reliability of any set of measurements is logically defined as the proportion of their variance that is true variance... We think of the total variance of a set of measures as being made up of two kinds of variance: true variance and error variance... The true measure is assumed to be the genuine value of whatever is being measured" (Guilford, 1965, p. 488). In other words,
![]()
(5)
Thus, reliability (as embodied for instance in the KR-20 or coefficient alpha) is not an index of quality of the instrument over which it is computed, but this index rather quantifies the extent to which scores can be reproduced. The major problem with the preceding definition is that it:
However, by explicitly modeling the stochastic nature of
each data point Xni Rasch scaling can identify
the source of the error variance. For instance, for the binary case,
(6)
The
error variance of Rasch measures can thus be estimated by taking into account
the sum of the modeled variance of observations. Of course, this
"model" error variance requires the data to conform stochastically to
the Rasch model. Since there is always additional noise in the data,
simulations (Linacre, 1997) indicate that a more appropriate estimate of the
“real” error variance is:
“Real”
error variance = model variance * MAX(1.0, Infit mean-square) (7)
Accordingly, Rasch reliability indices tend to be lower than KR-20 and coefficient alpha. Equation 7 further implies that these indices always exceed the maximum reliability, thus indicating that a test has better measurement characteristics than it actually has. To be sure, KR-20 and coefficient alpha accurately reflect the reliability of raw scores. However, raw scores are not trait measures, but rather local, test-dependent rankings, and generalizing raw scores to test-independent, generalizable measures is simply not justified. This difference is increasingly recognized, and AERA/APA/NCME Standards recommends that the “error of measurement based on one approach should not be interpreted as interchangeable with another derived by a different technique” (Standard 2.5).
Item and Person Reliability. Although this is rarely done within the framework of classical test theory, the above applies equally to items and respondents. Thus, two types of reliability can be distinguished:
In addition to providing an
impression of the adequacy of the size of the calibration sample, the latter is
important in situations where items are selected based on their locations on
the latent Rasch dimension (e.g., in test equating and computer adaptive
testing).
Item and Person Separation. While reliability indices are widely used, their interpretation is hindered by the fact that reproducibility is not a direct function of their magnitude. For instance, the difference between the two reliability coefficients 0.55 and 0.65 is far less than that between 0.85 and 0.95. For this reason, in Rasch scaling contexts the item and person reliability coefficients (R) are often expressed as separability indices (G):
(8)
The separation index corresponds directly to the value of
Equation 5 above, and thus G ranges from 0 to ¥. The advantage of
using G rather than reliability indices is that they directly reflect
the number of statistically different performance strata that the test can
successfully identify within a particular sample. Thus, when G = 2.5
this indicates that the test succeeds in distinguishing at most
= 2 different strata
of individuals.
Fisher (1992) takes a
slightly more liberal approach by defining the number of Discernible Strata
as (4 G + 1) / 3. The rationale for this definition is that the
functional range of typical measures is around 4 True SD. In most cases,
it is reasonable to inflate this by 1 RMSE to allow for the error in the
observed measures. If we then define a
significant difference between two measures as requiring a difference of at
least three RMSE, then there are
significantly
different levels in the functional measurement range.
Figure 2

Local SEB. It has long been known that “reliability depends
upon the population measured as well as the measuring instrument...[hence one]
should speak of the reliability of a certain instrument applied to a certain
population under certain conditions” (Guilford, 1965, p. 439). Regardless
whether reliability or separation indices are used, the fact remains that the
standard error in estimating respondents’ trait levels varies across the latent
dimension – hence, measurement reliability cannot be captured adequately by a
single index, not even within a single
sample. In this context we note that Standard 2.1 states that “For each
total score, subscore, or combination of scores that is to be interpreted,
estimates of relevant reliabilities and standard errors of test measurement or
test information should be reported.”
Taken literally, this means that such information should be available for
each possible TCI measure, and this is the approach taken here.
In this context we note
that according to the Rasch rating scale model[9]
the reliability with which a person’s trait level can be assessed varies
directly with the number of step values Fgk that lie near
this person’s location Bn on the latent dimension (cf.,
Wright & Masters, 1982). As measures become increasingly extreme, then the
density of the step values must eventually decrease. Hence, the standard error SEB
associated with extreme (i.e., relative to the available Fgk)
person measures Bn is greater than the SEB
for Bn closer to the bulk of the Fgk. This
fact is illustrated in Figure 2, which shows a plot of the SEB
against the person measures Bn derived from a “test” consisting
of seven hypothetical rating scale items (Note: Additional plots based on
actual data will be given in Section 4 below).
4. Scaling Results
Respondents. Respondents. The scaling of the TCTTM
reported here is based on the responses of 11,576 users of TRUE.com’s online dating service. This
sample comprised 5769 men and 5807 women with a mean age of 35.3 years (Median
= 34, Range = 17 to 84 years). The distribution of respondents’ sexual
preferences – as inferred from their own gender and the preferred gender of
their possible partners [both M(ale) vs. F(emale)] – was: MM = 213, MF
= 5556, FM = 5508, FF = 299. Regardless of their fit to the Rasch
model (or lack thereof), no respondents were excluded from the analyses. The
frequency of the responses to the 218 selected questions’ various options are
listed in Appendix B in the online Manual.
Item Fit. To obtain a baseline, a series of
analyses were performed to determine the items’ fit to Equation 3 shown earlier
in Section 3 by treating 218 active non-adaptive items as a single scale.
Similar analyses were then performed over the items in the seventeen most
important subscales. For reasons that were discussed in the introduction, we
identify the items as well as the subscales by numeric tags only.
Appendix C
shows the locations Dgi of all 218 items, together with the
standard errors of estimate SEDi, as well as these items’ Infit,
Outfit, and Item-Total correlations.[10]
Rather surprisingly, and indicative of low dimensionality, the fit of the items
to a single Rasch dimension is quite good. Except for one item (Item 65),
all Outfit values fall within the standard acceptable range (i.e., 0.6
< Outfit < 1.4). Also, just 5 of the 218 items show negative Item-Total
correlations. However, the results of a principal-component analysis of the
item’s residuals (not shown) revealed substantial loadings on the first
residual factor. Accordingly, it is meaningful to consider additional factors.
Subfactors. The seventeen factors studied next were
labeled as Factors 10, 18, 19, 29, 35, 42, 52, 71, 72, 73, 75, 76, 82, 84, 85,
88, and 90.[11] The results
of the Rasch analyses of these factors are reported in Tables 2 through 18
below. It can be observed that the items show excellent fit to the Rasch model,
as indicated by the acceptable Outfit values and positive Item-Total
correlations (with very few exceptions, as is indicated by boldface entries).
Accordingly, the internal structure of these factors supports the assumption
that the items indeed define a latent dimension in accordance with the scaling
assumptions of the Rasch model.
Table 2: Factor 10
|
Item |
Di |
SED |
Infit |
Outfit |
ritem-tot |
|
I0038 |
-0.93 |
0.01 |
0.92 |
0.92 |
0.36 |
|
I0072 |
-0.90 |
0.01 |
0.89 |
0.85 |
0.44 |
|
I0083 |
-0.81 |
0.01 |
0.91 |
0.88 |
0.43 |
|
I0077 |
-0.71 |
0.01 |
0.97 |
0.98 |
0.36 |
|
I0041 |
-0.60 |
0.01 |
0.88 |
0.86 |
0.46 |
|
I0058 |
-0.43 |
0.01 |
0.95 |
0.97 |
0.40 |
|
I0234 |
-0.16 |
0.02 |
1.02 |
0.99 |
0.14 |
|
I0201 |
0.06 |
0.01 |
0.96 |
0.96 |
0.49 |
|
I0170 |
0.53 |
0.01 |
1.02 |
1.51 |
0.29 |
|
I0090 |
0.62 |
0.01 |
1.14 |
1.16 |
0.30 |
|
I0049 |
1.01 |
0.01 |
1.16 |
1.33 |
0.26 |
|
I0048 |
2.31 |
0.02 |
1.10 |
1.24 |
0.00 |
Table 3: Factor 18
|
Item |
Di |
SED |
Infit |
Outfit |
ritem-tot |
|
I0180 |
-1.07 |
0.01 |
1.00 |
1.02 |
0.25 |
|
I0203 |
-1.03 |
0.01 |
1.00 |
1.01 |
0.25 |
|
I0043 |
-0.87 |
0.01 |
1.06 |
1.26 |
0.17 |
|
I0053 |
-0.82 |
0.01 |
0.97 |
1.01 |
0.30 |
|
I0054 |
-0.82 |
0.01 |
0.98 |
1.03 |
0.29 |
|
I0130 |
-0.31 |
0.02 |
1.07 |
1.12 |
0.07 |
|
I0005 |
0.04 |
0.01 |
1.00 |
1.04 |
0.39 |
|
I0074 |
0.22 |
0.01 |
0.89 |
0.89 |
0.49 |
|
I0172 |
0.35 |
0.01 |
1.05 |
1.05 |
0.21 |
|
I0115 |
0.41 |
0.01 |
1.06 |
1.05 |
0.12 |
|
I0060 |
0.92 |
0.01 |
0.98 |
1.02 |
0.41 |
|
I0163 |
0.93 |
0.01 |
0.92 |
0.94 |
0.45 |
|
I0113 |
0.97 |
0.01 |
0.96 |
0.99 |
0.41 |
|
I0096 |
1.07 |
0.01 |
0.99 |
0.99 |
0.37 |
Table 4: Factor 19
|
Item |
Di |
SED |
Infit |
Outfit |
ritem-tot |
|
I0053 |
-0.71 |
0.01 |
0.92 |
0.89 |
0.45 |
|
I0054 |
-0.71 |
0.01 |
0.94 |
0.91 |
0.43 |
|
I0040 |
-0.70 |
0.01 |
0.81 |
0.76 |
0.54 |
|
I0006 |
-0.52 |
0.01 |
0.85 |
0.83 |
0.52 |
|
I0129 |
-0.41 |
0.01 |
0.89 |
0.88 |
0.50 |
|
I0058 |
-0.36 |
0.01 |
0.83 |
0.83 |
0.55 |
|
I0151 |
-0.01 |
0.01 |
0.93 |
0.93 |
0.52 |
|
I0193 |
0.89 |
0.01 |
1.16 |
1.20 |
0.09 |
|
I0163 |
1.23 |
0.01 |
1.25 |
1.48 |
0.23 |
|
I0113 |
1.29 |
0.01 |
1.30 |
1.72 |
0.17 |
Table 5: Factor 29
|
Item |
Di |
SED |
Infit |
Outfit |
ritem-tot |
|
I0138 |
-1.15 |
0.01 |
0.99 |
1.02 |
0.23 |
|
I0038 |
-1.13 |
0.01 |
1.00 |
1.08 |
0.21 |
|
I0147 |
-1.12 |
0.01 |
0.99 |
1.10 |
0.21 |
|
I0076 |
-1.01 |
0.01 |
1.02 |
1.07 |
0.22 |
|
I0088 |
-0.46 |
0.01 |
0.99 |
0.99 |
0.34 |
|
I0150 |
-0.39 |
0.01 |
1.07 |
1.11 |
0.19 |
|
I0201 |
-0.17 |
0.01 |
0.91 |
0.91 |
0.47 |
|
I0090 |
0.35 |
0.01 |
0.94 |
0.94 |
0.43 |
|
I0194 |
0.42 |
0.01 |
0.99 |
1.00 |
0.40 |
|
I0164 |
0.44 |
0.01 |
1.06 |
1.07 |
0.13 |
|
I0190 |
0.50 |
0.01 |
0.96 |
0.97 |
0.40 |
|
I0049 |
0.70 |
0.01 |
0.97 |
1.00 |
0.38 |
|
I0188 |
1.06 |
0.01 |
1.05 |
1.07 |
0.21 |
|
I0048 |
1.99 |
0.02 |
1.02 |
1.06 |
0.11 |
Table 6: Factor 35
|
Item |
Di |
SED |
Infit |
Outfit |
ritem-tot |
|
I0064 |
-1.57 |
0.01 |
1.18 |
1.22 |
0.36 |
|
I0185 |
-0.85 |
0.01 |
0.93 |
0.96 |
0.50 |
|
I0057 |
0.11 |
0.11 |
1.11 |
1.16 |
0.54 |
|
I0194 |
0.11 |
0.01 |
0.84 |
0.80 |
0.65 |
|
I0066 |
0.40 |
0.13 |
0.72 |
0.68 |
0.69 |
|
I0104 |
0.50 |
0.01 |
0.82 |
0.76 |
0.63 |
|
I0143 |
0.52 |
0.01 |
0.79 |
0.71 |
0.64 |
|
I0091 |
0.79 |
0.01 |
0.90 |
0.78 |
0.57 |
Table 7: Factor 42
|
Item |
Di |
SED |
Infit |
Outfit |
ritem-tot |
|
I0155 |
-0.76 |
0.01 |
0.99 |
1.00 |
0.43 |
|
I0020 |
-0.45 |
0.01 |
0.99 |
0.99 |
0.49 |
|
I0199 |
-0.45 |
0.01 |
0.98 |
0.98 |
0.50 |
|
I0102 |
-0.27 |
0.01 |
0.92 |
0.93 |
0.53 |
|
I0080 |
-0.01 |
0.01 |
1.00 |
1.01 |
0.50 |
|
I0193 |
0.38 |
0.01 |
1.15 |
1.19 |
0.16 |
|
I0163 |
0.75 |
0.01 |
0.94 |
0.93 |
0.51 |
|
I0113 |
0.81 |
0.01 |
0.97 |
0.95 |
0.49 |
Table 8: Factor 52
|
Item |
Di |
SED |
Infit |
Outfit |
ritem-tot |
|
I0192 |
-0.82 |
0.02 |
0.96 |
0.96 |
0.30 |
|
I0039 |
-0.54 |
0.01 |
0.98 |
0.98 |
0.39 |
|
I0144 |
-0.35 |
0.01 |
0.96 |
0.96 |
0.34 |
|
I0155 |
-0.26 |
0.01 |
0.91 |
0.89 |
0.47 |
|
I0199 |
0.06 |
0.01 |
0.89 |
0.88 |
0.52 |
|
I0240 |
0.39 |
0.01 |
1.06 |
1.07 |
0.38 |
|
I0035 |
0.71 |
0.01 |
1.09 |
1.11 |
0.21 |
|
I0070 |
0.82 |
0.01 |
1.15 |
1.19 |
0.27 |
Table 9: Factor 71
|
Item |
Di |
SED |
Infit |
Outfit |
ritem-tot |
|
I0162 |
-0.90 |
0.01 |
1.10 |
1.40 |
0.26 |
|
I0197 |
-0.72 |
0.01 |
1.01 |
0.99 |
0.25 |
|
I0119 |
-0.27 |
0.02 |
1.03 |
0.99 |
0.21 |
|
I0169 |
-0.15 |
0.10 |
1.11 |
1.07 |
0.46 |
|
I0135 |
0.23 |
0.01 |
0.83 |
0.82 |
0.58 |
|
I0196 |
0.24 |
0.02 |
1.02 |
1.08 |
0.13 |
|
I0105 |
0.25 |
0.01 |
1.01 |
1.04 |
0.44 |
|
I0002 |
0.63 |
0.01 |
1.04 |
1.12 |
0.39 |
|
I0136 |
0.69 |
0.01 |
0.87 |
0.86 |
0.52 |
Table 10: Factor 72
|
Item |
Di |
SED |
Infit |
Outfit |
ritem-tot |
|
I0138 |
-1.09 |
0.01 |
0.97 |
0.97 |
0.25 |
|
I0038 |
-1.06 |
0.01 |
0.98 |
1.00 |
0.24 |
|
I0076 |
-0.95 |
0.01 |
0.99 |
1.01 |
0.25 |
|
I0237 |
-0.57 |
0.01 |
0.99 |
1.18 |
0.33 |
|
I0088 |
-0.41 |
0.01 |
1.00 |
0.99 |
0.32 |
|
I0110 |
-0.16 |
0.01 |
0.97 |
0.98 |
0.37 |
|
I0228 |
-0.14 |
0.02 |
0.99 |
1.00 |
0.15 |
|
I0238 |
0.23 |
0.01 |
1.00 |
1.08 |
0.19 |
|
I0170 |
0.36 |
0.01 |
0.98 |
1.40 |
0.26 |
|
I0164 |
0.47 |
0.01 |
1.00 |
1.00 |
0.21 |
|
I0049 |
0.72 |