Research and Practice

Spring 1999, v15-2

Using Girl Scouting as an example group, this article first discusses issues of concern in measuring outcomes of programs for children and youth, and then provides a common set of guidelines for evaluators and program directors to use in their evaluations. This is a revised version of papers presented to the Independent Sector, Spring Research Forum, Arlington, VA, March 1997, and the American Sociological Association, San Francisco, CA, August, 1998.


Issues to Be Considered in Evaluating Programs for Children and Youth
by Mary C. Sengstock, Ph.D., Melanie Hwalek, Ph.D.


Introduction
More and more, youth-serving organizations are facing the need to measure the outcomes of their programs. While the need to measure outcomes of young people's programs is clear, the way to measure outcomes is less certain. There are numerous issues that need to be considered in evaluating children and youth and their programs; the guidelines must be different than those observed in working with adults. Both qualitative and quantitative approaches to evaluation are possible; quantitative measures are most useful in providing evidence of outcomes, while qualitative research is most valuable in indicating why or how these outcomes were achieve. In this paper the issues relate primarily to quantitative measures, although some may apply to qualitative studies as well.

Evaluating programs for children and youth is much more complex than evaluating programs for adults. The rapid developmental changes that occur in the first 15 years of life are unprecedented compared with other age groups. Even within a specific age subgroup, programs for children and youth often include a wide variety of racial, ethnic, or cultural backgrounds, or children for whom English is a second language. People attempting to measure outcomes to children for the first time may not realize that measures developed for adults cannot be used for children, that use of inappropriate measures can result in inaccurate evaluations, or that inappropriate use of evaluation tools can actually do damage to the program and the young people involved.

We hope that this article can be of assistance to both evaluators and program directors in developing effective evaluations of programs for children and youth. Our focus will be on the methodological and ethical issues which need to be considered in evaluating children and youth, rather than on specific types of evaluation procedures. Throughout the paper, examples are given of issues and/or concepts. While Girl Scouting is the program we use in these examples, the issues raised are generic and can be applied to almost any comprehensive program for children and youth.

Eight major issues will be considered in the sections that follow:

  • Multi-Dimensionality of Program Outcomes
  • Time and Attention Span of Children and Youth
  • Developmental Changes in Children and Youth
  • Need to Measure Outcomes in a Comparison Group
  • Age-Appropriateness of Measurement Instruments
  • Instruments Sensitive to the Cultural Diversity of the Children and Youth Involved
  • Program versus Individual Evaluation
  • Ethical Issues in Testing Children and Youth


Multi-dimensionality of Program Outcomes
A major problem in evaluating young people's programs is the diversity of expected outcomes. In Girl Scouting, for example, the goals include developing each girl's individual potential; enabling her to relate with understanding and respect to different types of people; developing her value system and ability to make decisions; and imparting a sense of community responsibility. A successful program evaluation will encompass these diverse dimensions.

Furthermore, these concepts are not unitary--each involves several subdimensions. For example, "developing each girl's potential" involves measuring her abilities in various areas of competence, feelings of self-efficacy, and sense of empowerment. A good program evaluation would examine each program goal and ensure that all major dimensions are included in the evaluation plan.

Time and Attention Span of Children and Youth
Because of the multi-dimensionality of programs for children and youth, most evaluation instruments will need to include several different measures. This is a special concern with young people, who have limited time and attention spans. Questionnaires cannot be used with younger children at all; with older youth, if the questionnaire is too long, they may lose interest and provide inaccurate data. While it is tempting to select an instrument and eliminate irrelevant items, most measurement instruments have been developed and tested as a complex whole; removing some items can damage their ability to make valid (meaningful) and reliable (dependable) measurements.

Developmental Changes in Children and Youth
Measurement is complicated by the intense developmental stages of all children and youth, making it difficult to determine whether an observed change is the result of a program or of a child's own natural development process. For example, if Brownie Girl Scouts (grades one through three) are tested at the beginning and end of the program year, and their tests indicate increased skills in leadership, does this indicate that the troop's leadership program was successful? Perhaps, but the girls' natural development may account for all or part of the changes, regardless of any specific troop program.


Measurement is complicated by the intense developmental stages of all children and youth, making it difficult to determine whether an observed change is the result of a program or of a child's own natural development process.


Need To Measure Outcomes in a Comparison Group
A way of helping to distinguish between program impact and natural developmental changes, or other alternative explanations, is by testing a comparison group of similar children or youth who have not participated in program activities. If program children exhibit more change than the comparison group, this provides evidence that the program had an impact. Technically, this is not what is generally called a "control group," which involves random selection of children into program or non-program groups. Some evaluators believe control groups are not appropriate [
11].

An example of a comparison group for Girl Scouting would be girls who wished to be Girl Scouts, but could not because leaders were unavailable. An alternative method would be to use children and youth in the schools that the Girl Scouts attend as a comparison group. Girl Scout troop leaders could ask teachers in the appropriate grades to conduct the same pre- and post-tests with the whole
class at the same times of the year as the Girl Scouts in the troop-a method that was used in Girl Scouts of the U.S.A.'s national outcomes study [
7]. While there is still a selection bias, since girls have not asked to be members, observed differences between the Girl Scouts and the classes may suggest the impact of the program.

Using classes for comparison can be a complex process. Permission from the school administration, the cooperation of the individual teacher, and parental permission are all necessary before minor children can be tested. Test administrators, both in the classroom and the program, must be trained to ensure the test is administered similarly in both settings. The testing of school children also limits the length of questionnaires to one that can be completed in one class period.

Age-appropriateness of Measurement Instruments
Because of rapid childhood development, the same instruments cannot be used for all age levels. When a program, such as Girl Scouting, includes young people from diverse ages, different evaluation measures must be employed. Preschool and kindergarten children lack reading skills and must be interviewed individually, tested through the use of pictures, have the questions explained to them as a group, or rated by trained observers. Children aged 7-11 can read and take tests independently, with assistance from a teacher or leader, but they will be unable to understand questions designed for older youth and adults.

Adolescents (aged 12-16 years) can often be tested in similar ways to adults: they can understand the questions and take tests on their own. There is some debate as to whether 17- and 18-year-olds should receive a more "adult" form of the test [11].

While the use of age-specific test forms is essential, it also presents comparability problems. If the wording used for 7- to 11-year-olds is different from the form given to 12- to 16-year-olds, you cannot assume that the tests are measuring the same concepts, or that the scores are comparable across instruments. For example, if a Brownie Girl Scout (grades one through three) remains in the program through Cadette Girl Scouts (junior high school age), and is tested at both times, it is tempting to compare her scores and interpret any differences as "changes." However, since forms used in the two instances would have been different, the two scores are not comparable.

Diversity-sensitive Instruments
When programs include children and youth from diverse backgrounds, evaluators must ensure that these diverse populations are being tested appropriately. For example, children and youth for whom English is a second language may not understand questionnaire wording in the same way as native speakers of English. Score variations may be due to a different understanding of test questions, rather than a real difference in outcomes. Children and youth with disabilities or from different racial, cultural, or socio-economic groups may also understand the questionnaire in different ways. These all present validity problems in the testing process.

To alleviate these problems, questionnaires should be pre-tested with children from the diverse backgrounds. During this time, the goal is not the collection of data but the examination of the instrument. Young people and adults from the diverse groups can be asked to indicate any words which may be misunderstood or are culturally biased.

The identity of test administrators is also important. Some experts contend that test administrators should be "matched"to the persons tested, so that respondents will be more open with their answers. Hence, black children should be tested by a black interviewer; girls by other girls or women; and so on [
6]. Other experts contend that successful interviews can occur across cultural barriers, if interviewers are appropriately trained [10].


Some experts contend that test administrators should be "matched" to the persons tested, so that respondents will be more open with their answers. Hence, black children should be tested by a black interviewer; girls by other girls or women; and so on.




Program Versus Individual Evaluation

Instruments designed to measure program effectiveness should never be used to make decisions about individual children and youth. It is tempting to want to use program evaluation results to select children and youth for special programs, and indeed, there is evidence that this occurs in school settings [
12: 76-77]; [4]; [5]; [2]; [9]; [8]; [3].

Using evaluation results for individual assessment is definitely not appropriate, because the statistical measures used to obtain "cut-off" scores for individual assessments are different from those used for summary assessments of program outcomes. For example, in program evaluation, if the average score on a specific measure (such as a leadership skills measure) is compared for program and non-program children and youth, the interpretation of the higher average score indicates that the program has successfully taught leadership skills. However, it is not appropriate to use this statistic to assign young people whose scores are below the average to an alternate program. Such decisions require different statistical measures [
1: 101-104].

Ethical Issues in Testing Children and Youth

Confidentiality is a major ethical issue in the testing of children and youth, both to ensure the honesty of the information provided, and to protect respondents from being hurt by their research participation. In research with children and youth, however, an ethical dilemma arises. Suppose, for example, that a child's interview reveals physical or sexual abuse or the abuse of drugs in the family. Professional ethics and most state laws require that such information be reported to an official agency. The issue is often resolved by use of consent forms, asking parents to give permission for their children's testing. Such a form should include a statement that the evaluator must conform to any state law or professional ethics code, which requires the reporting of information in order to protect the child.

Conclusion

Measuring the outcomes of progams for children and youth is relatively new, uncharted territory. Guidelines for measuring children and youth are not apparent and appropriate instruments are largely unavailable. Using Girl Scouting as an example, this paper has presented eight methodological and ethical concerns which must be observed in evaluating programs for children and youth. The sidebar on page XX summarizes these issues in a checklist form for others to use in their own settings.
 
Sidebar 1:
Checklist of Methodological and Ethical Concerns in Evaluating Programs for Children and Youth
 
 
 

Working together, evaluators and program directors can develop more effective techniques for measuring outcomes of young people's programs: better instruments, clearer guidelines, and more constructive designs for measuring program effectiveness. Teamwork can also result in economies of scale--similar programs can develop and use the same instruments. Properly employed, program evaluation can lead programs for children and youth to even greater effectiveness in developing the next generation of productive adults.

 
 
 
 

References


1. Allen, M.J., & W.M. Yen, 1979. Introduction to Measurement Theory. Monterey, CA: Brooks/Cole Publishing Company. (back)

2. Durkin, D., 1987. "Testing in the Kindergarten." Reading Teacher 40:766-780.
(back)

3. Geisinger, K.F., 1992. "Fairness and Selected sychometric Issues in the Psychological Testing of Hispanics." Chap. 3 (pp. 17-42). In K. F. Geisinger, Ed. Psychological Testing of Hispanics. Washington, DC: American Psychological Association, 1992.
(back)

4. Goodwin, W.L., & L.D. Goodwin, 1982a. "Measuring Young Children." In B. Spodek, Ed. Handbook of Research in Early Childhood Education. (pp. 523-563). New York: Free Press.
(back)

5. Goodwin, W.L., & L.D. Goodwin, 1982b. "Young Children and Measurement: Standardized and Nonstandardized Instruments in Early Childhood Education." In B. Spodek, Ed. Handbook of Research on the Education of Young Children. (pp. 441-463) New York: Free Press.
(back)

6. Hagenaars, J.A., & T.G. Heinen, 1982. "Effects of
Role-Independent Interviewer Characteristics on Responses." In W. Durkstra, & J.VanderZanden, Response Behavior in the Survey-Interview. Chap. 4. (Pp. 91-130.) London & New York: Academic Press.
(back)

7. Hwalek, M., and M.E. Minnick, 1997. Girls, Families and
Communities Grow Through Girl Scouting.
New York: Girl Scouts of the USA.
(back)

8. LaCrosse, E.R., Jr, 1970. "Psychologist and Teacher: Cooperation or Conflict?" Young Children 25: 223-229..
(back)

9. Meisels, S.J., 1987. Uses and Abuses of Developmental Screening in Early Childhood: A Guide. Washington, DC: National Association for the Education of Young Children.. (back)

10. Pedersen, P., & A. Ivey, 1993. Culture-centered Counseling and Interviewing Skills: a Practical Guide. Westport, CN: Praeger. (back)

11. Stone, W.K., & K.L. Lemanek, 1990. "Development Issues in Children's Self-Reports," Chap. 2, pp. 18-56, in A. M. LaGreca, Through the Eyes of the Child: Obtaining Self-Reports from Children and Adolescents. Boston: Allyn and Bacon. (back)

12. Wortham, S,.C., 1995. Measurement and Evaluation in Early Childhood Education. Englewood Cliffs, NJ: Prentice-Hall. (back)

Authors


Mary C. Sengstock is Professor of Sociology at Wayne State University in Detroit, Michigan. She holds a Ph.D. in Sociology from Washington University in St. Louis, Missouri. Her areas of expertise include applied sociology, ethnic groups in the U.S., family violence, and gerontology. She has published numerous articles on the use of sociology in applied settings. Dr. Sengstock is also a Certified Clinical Sociologist and a licensed social worker in the State of Michigan.

Melanie Hwalek
is President of SPEC Associates, a program evaluation and research firm located in Detroit, Michigan. She holds a Ph.D. in Social Psychology from Wayne State University. Her areas of expertise include psychometrics, social research design, and program evaluation. Dr. Hwalek is one of the seven national outcome measurement consultants of United Way of America. She was principal researcher and author of Girl Scouts of the U.S.A.'s national outcome study.

 
 

NEW DESIGNS FOR YOUTH DEVELOPMENT © 1999