Reliability, Validity and
Understanding the value and usefulness of any
psychological or behavioral screening instrument is not a simple task.
Ideally one should be an expert in statistics and research design in
order to understand various graphs, values, quotients and ratings.
Understanding calculation results and analysis procedures can be difficult
even for licensed and trained health care professionals.
There are three ways to understand any questionnaire that is being used
to evaluate and understand human behavior. The following are general
definitions of the terms necessary to appreciate the value
of any instrument.
The reliability of any questionnaire is defined as the
consistency with which the same results are achieved. Practically
speaking that means a person completing the questionnaire would produce the
same responses and results if he or she completed the questionnaire a second
time. It also means that more than one person completing a questionnaire
with the same knowledge and experience would have similar results. Of course,
both people have similar knowledge and experience with the person being evaluated. Asking
good questions is important. The
reliability of any questionnaire is not necessarily constant. The person
being evaluated can change over time. The person completing the
questionnaire may be biased or change over time. The evaluators may have different
experiences and levels of knowledge regarding the person being evaluated. For these
reasons, the results can be different. Any questionnaire that provides
significantly different results for no reason other than the evaluators' bias
may not provide a valid result. The reliability of a questionnaire depends
on the questionnaire and the person answering the questions.
The validity of a questionnaire is defined as its ability
to measure and describe what it is supposed to measure and describe. In
other words, if the questionnaire is supposed to measure depression, then the
results should correctly identify actual depression. Ideally, it should identity
depression and not other problems or problems that are like depression but not the same. The ability of a questionnaire to identify what it is supposed
to identify depends asking and getting answers to questions that
collectively identify the attribute, state or quality we want to identify.
There are 6 primary types of validity. Following is a
brief description of each.
Face Validity. This form
of validity is based on commonly accepted opinion or
consensus of opinion. Face validity is normally established by qualified professional
observation, investigation or experience with an instrument, test or a
computer-based test interpretation system. Face validity is based on how the
Content Validity. This
form of validity is based on the content (actual questions) used in a survey
or questionnaire. Content validity is established by a professional
or professionals selecting appropriate content for questions and statements. The
results of a questionnaire or survey are considered valid if the questions
are appropriate and necessary to identify a specific
attribute, state or quality.
Predictive Validity. This
form of validity is based on a questionnaire's ability to predict what it is
supposed to predict that its ability to predict some future state, result or event.
Concurrent Validity. This
form of validity means a questionnaire or survey is capable of
identifying a state, attribute, quality or result that is already known. An
instrument is valid if it correctly identifies by some other means a state or result
that is already known to exist.
Construct Validity. This form of validity is the
most difficult to establish. It is normally based on demonstrating
meaningful relationships among elements of states, attributes, results,
problems or disorders. For example, there are different symptoms of
depression. There are many other disorder that do not have the symptoms
associated with depression. For instance, depression is in many ways the
opposite of mania. A result of depression would
have construct validity if results showed a positive relationship between low energy
no relationship between depression and the high energy
in mania. The results would diverge.
This form of validity can help determine whether or not a particular
instrument or method provides a significant improvement in addition to
the use of another approach. For instance, does a screening instrument
provide a significantly better result during an 50 minute interview than
just using just the interview alone? An particular approach is said to
have incremental validity if it actually helps more that not using it.
The StepOne Online web site has been reviewed by qualified
professionals in order to insure the program meets ethical and
professional standards for development and operation. A standardization
and administration manual is available from the developers.
The initial normative sample for the general population
consists of 211 families with an n equal to 432 children. The
questionnaires can be used for all children age 11 to 17. The normative
sample for the profile summary scales consisted of 8 separate stratified
samples that were combined into a total normative sample with an n of 87.
Diagnostic and agreement statistics were established based on 9 mental
health criterion groups. StepOne for parents is a good predictor of mental
health problems and high risk behaviors.
The StepOne for Parents Program was developed in a primary
medical care and a private practice mental health care setting. The
program can be used as a formative research tool in private practice
mental health, primary medical care, foster care and education.
Reliability was established based on a test-retest strategy with a kappa
of r equal to 0.94. Validity has been established through convergence with
other scales (other constructs) based on correlations that range from r=
0.54 to r=0.84. Validity has also been established based on significant
agreement statistics for criterion groups covering categories of broad
mental health concerns including risk factors. The validity of the
clinical screening report should be good provided the prevalence of mental
health and addictive disorders are not less that 10% or more than 50%.
Validity estimates are improved by gathering information
from multiple informants such as two parents or more than one care givers.
The use of StepOne by more than one parent appears to create interaction
between parents and measures how well parents know their child. The system
can provide parents with community specific results and guidance based on
The usefulness of a questionnaire is often referred to as
its utility. The utility of a questionnaire is defined as the value or
cost of using the questionnaire to identify the attribute, state,
quality or event we want to identify. There is more than one way to identify a
state, event, attribute or quality. Some methods require less effort or
fewer resources than others. The idea is to use surveys and questionnaires
that are efficient, have a low risk of harm and are cost effective. A questionnaire with high utility is
one where the cost of identifying an attribute or quality is low and the
cost of being wrong is not high. Another term for utility is the
"usefulness" of an instrument, although "usefulness" does not have
the precise definition of "utility" within the field of statistics.
Parent and professional feedback has been gathered
regarding the usefulness of StepOne. Parents and professionals reviewing
this report may find a few inconsistencies. More than one problem or
concern may be raised. In this case, parents and counselors should discuss
these concerns and evaluate these issues further. Serious inconsistencies
may indicate that a parent does not know their child very well.
Inconsistencies can also surface because this is a screening report and
NOT a psychological evaluation that provides a diagnosis. As such, only a
qualified professional can perform an evaluation, resolve inconsistencies
and make a diagnosis.
A given report may raise issues and questions for parents
to discuss with professionals. No screening process or result is always
correct. The results may raise potential problems that seem unlikely to
parents and professionals. This occurs most often when eating disorders or
the risk of violent, suicidal or self-injuring behavior is identified.
In some cases, the identified problems may be the result
of other underlying problems that parents may not see or be aware of. More
than one problem, concern or issue may be raised in the screening report.
Discussing these results and further evaluation as indicated by a health
or mental health care professional is always appropriate. In a sample of
over 592 parents, no less than 97% felt StepOne for Parents was very well
organized and useful. Approximately 99% felt they could share this report
with a health care professional.
Parent reactions and response to screening reports may
vary a great deal. The vast majority of parents find StepOne reports
useful and helpful, and they feel the report reflects their concerns. In
effect, reports are designed to help clarify what parents suspect, fear or
are concerned about. Other parents will find that the report raises issues
that they were not aware of. This is also very common because parents can
know a great deal about their child but they do not always understand the
significance or meaning of certain behaviors and patterns of behavior. As
a result, some parents will experience a higher level of concern after
completing their screening than before. Parents should remember that this
is a screening - not an evaluation. As such, the results are not always
correct. In some cases, the screening will reveal a problem that is the
result of an underlying problem that a parent, family member or teachers
could not see.
Preliminary research suggests that StepOne may reduce the
number of appointments by three. StepOne appears to create a more
efficient and effective use of time. StepOne is also altering the
prescribing practices of physicians. StepOne appears to foster further
physician-patient interactions and appears to change, reduce or stop
certain medications. The developers believe that it would take between 3
and 6 hours for a person to screen and write an equivalent report.
February 28, 2012