This is not the document you are looking for? Use the search form below to find more!

Report home > Education

TOEFL TEST OF WRITTEN ENGLISH GUIDE

0.00 (0 votes)
Document Description
The Test of Written English (TWE) is the essay component of the Test of English as a Foreign Language (TOEFL), the multiple-choice test used by more than 2,400 institutions to evaluate the English proficiency of applicants whose native language is not English. As a direct, productive skills test, the TWE test is intended to complement TOEFL Section 2 (Structure and Written Expression). The TWE test is holistically scored, using a criterion-referenced scale to provide information about an examinee's ability to generate and organize ideas on paper, to support those ideas with evidence or examples, and to use the conventions of standard written English.
File Details
Submitter
  • Name: wick
Embed Code:

Add New Comment




Related Documents

The following scoring guide is used to evaluate the TOEFL Test

by: elle, 1 pages

The following scoring guide is used to evaluate the TOEFL Test of Written English.

The Ultimate English Guide for Beginner and Intermediate English Learners

by: realpropertyiq001, 1 pages

Our VIP Lounge incorporates speaking, listening to, reading and writing into its program for a complete English guide. Have our English tutors correct your grammar in writing, listen to native ...

zygor guide world of warcraft leveling guide

by: abi7, 4 pages

zygor guide zygor leveling guide become a world of warcraft leveling guide champion

The Use of Different English Language Learning Strategies by Iranian Female University Level Learners of English Language as a University Major Based on Personality Traits

by: Seyed Hossein Fazeli, 1 pages

The Use of Different English Language Learning Strategies by Iranian Female University Level Learners of English Language as a University Major Based on Personality Traits + Abstract of Ph.D. Thesis

Qualitative Data-Test of independence

by: Eidow, 6 pages

Y h Chan

The Relationship between the Neuroticism Trait and Use of the English Language Learning Strategies

by: Seyed Hossein Fazeli, 14 pages

The present study aims to find out the relationship between the Neuroticism trait and English Language Learning Strategies (ELLSs) for learners of English as a foreign language. Four instruments were ...

The Relationship between the Conscientiousness Trait and Use of the English Language Learning Strategies

by: Seyed Hossein Fazeli, 15 pages

The present study aims to find out the relationship between the Conscientiousness trait and English Language Learning Strategies (ELLSs) for learners of English as a foreign language. Four ...

The relationship between the extraversion trait and use of the English language learning strategies

by: Seyed Hossein Fazeli, 7 pages

The present study aims to find out the relationship between the Extraversion trait and use of the English Language Learning Strategies (ELLSs) for learners of English as a foreign language. Four ...

The influence of personality traits on the use of Memory English Language Learning Strategies

by: Seyed Hossein Fazeli, 6 pages

The present study aims to find out the influence of personality traits on the choice and use of Memory English Language Learning Strategies (MELLSs) for learners of English as a foreign language, and ...

Dialect Map of American English

by: jansen, 10 pages

Dialect Map of American English

Content Preview
TOEFL TEST OF WRITTEN ENGLISH GUIDE
Overview of the TWE Test
as a measure of a nonnative speaker’s ability to write for
The Test of Written English (TWE) is the essay component
academic purposes in English. The perception among many
of the Test of English as a Foreign Language (TOEFL), the
graduate faculty was that there might be little actual
multiple-choice test used by more than 2,400 institutions to
relationship between the recognition of correct written
evaluate the English proficiency of applicants whose native
expression, as measured by Section 2, and the production of
language is not English. As a direct, productive skills test, the
an organized essay or report (Angelis, 1982).
TWE ® test is intended to complement TOEFL Section 2
In surveys conducted in a number of studies (Angelis,
(Structure and Written Expression). The TWE test is
1982; Hale and Hinofotis, 1981; Kane, 1983) college and
holistically scored, using a criterion-referenced scale to
university administrators and faculty, as well as English as a
provide information about an examinee’s ability to generate
second language (ESL) teachers, requested the development
and organize ideas on paper, to support those ideas with
of an essay test to assess directly the academic writing skills
evidence or examples, and to use the conventions of standard
of foreign students.
written English.
As an initial step in exploring the development of an essay
Introduced in July 1986, the TWE test is currently (1996)
component for the TOEFL test, Bridgeman and Carlson
offered as a required component of the TOEFL test at five
(1983) surveyed faculty in undergraduate and graduate
administrations a year — in February, May, August, October,
departments with large numbers of foreign students at 34
and December. There is no additional fee for the TWE test.
major universities. The purpose of their study was to identify
the types of academic writing tasks and skills required of
college and university students.
The TOEFL Test
Following the identification of appropriate writing tasks
First administered in 1963-64, the TOEFL test is primarily
and skills, a validation study investigating the relationship of
intended to evaluate the English proficiency of nonnative
TOEFL scores to writing performance was conducted
speakers who wish to study in colleges or universities in
(Carlson, Bridgeman, Camp, and Waanders, 1985). It was
English-speaking countries. Section 1 (Listening Comprehension)
found that, while scores on varied writing samples and TOEFL
measures the ability to recognize and understand English as
scores were moderately related, the writing samples and the
it is spoken in North America. Section 2 (Structure and
TOEFL test reliably measured some aspect of English
Written Expression) measures the ability to recognize selected
language proficiency not assessed by the other. The researchers
structural and grammatical points in English. Section 3
also found that holistic scores, discourse-level scores, and
(Reading Comprehension) measures the ability to read and
sentence-level scores of the writing samples were all closely
understand short passages similar in topic and style to those
related. Finally, the researchers reported that correlations of
that students are likely to encounter in North American
scores were as high across writing topic types as within the
universities and colleges.
topic types, suggesting that the different topic types used in
During the 1994-95 testing year, more than 845,000 persons
the study comparably assessed overall competency in
in more than 180 countries and regions registered to take the
academic composition.
TOEFL test.
These research studies provided the foundation for the
development of the Test of Written English. Early TWE
TWE Developmental Research
topics were based on the types of writing tasks identified in
Early TOEFL research studies (Pike, 1976; Pitcher & Ra,
the Bridgeman and Carlson (1983) study. Based on the findings
1967) showed that performance on the TOEFL Structure and
of the validation study, a single holistic score is reported for
Written Expression section correlated positively with scores
the TWE test. This score is derived from a criterion-referenced
on direct measures of writing ability. However, some TOEFL
scoring guide that encompasses relevant aspects of
score users expressed concern about the validity of Section 2
communicative competence.
Copyright © 1996 by Educational Testing Service. All rights reserved.
1

TWE ITEM DEVELOPMENT
The TWE Committee
in writing. A criterion-referenced scoring guide ensures that
Tests developed by Educational Testing Service must meet
a level of consistency in scoring is maintained from one
requirements for fair and accurate testing, as outlined in the
administration to another.
ETS Standards for Quality and Fairness (Educational Testing
Service, 1987). These standards advise a testing program to:
Development of the TWE Scoring Guide
Obtain substantive contributions to the test
The TWE Scoring Guide (see Appendix B) was developed to
development process from qualified persons who are
provide concise descriptions of the general characteristics of
not on the ETS staff and who represent valid
essays at each of six points on the criterion-referenced scale.
perspectives, professional specialties, population
The scoring guide also serves to maintain consistent scoring
subgroups, and institutions.
standards and high interrater reliability within and across
Have subject matter and test development specialists
administrations. As an initial step in developing these
who are familiar with the specifications and purpose of
guidelines, a specialist in applied linguistics examined 200
the test and with its intended population review the
essays from the Carlson et al. (1985) study — analyzing the
items for accuracy, content appropriateness, suitability
rhetorical, syntactic, and communicative characteristics at
of language, difficulty, and the adequacy with which
each of the six points — and wrote brief descriptions of the
the domain is sampled. (pp. 10-11)
strengths and weaknesses of the group of essays at each level.
This analysis, the TWE Committee’s analysis of pretest essays,
In accordance with these ETS standards, in July 1985 the
and elements of scoring guides used by other large-scale
TOEFL program established the TWE Core Reader Group,
essay reading programs at ETS and elsewhere were used to
now known as the TWE Committee. The committee is a
develop the TWE Scoring Guide.
consultant group of college and university faculty and
The guide was validated on the aforementioned research
administrators who are experienced with the intended test
essays and on pretest essays before being used to score the
population, current writing assessment theory and practice,
first TWE essays in July 1986. To maintain consistency in the
pedagogy, and large-scale essay testing management. The
interpretation and application of the guide, before each TWE
committee develops the TWE essay questions, evaluates
essay reading TWE essay reading managers review a sample
their pretest performance using the TWE scoring criteria, and
of essays that are anchored to the original essays from the
approves the items for administration. Members also
first TWE administration. This review helps to ensure that a
participate in TWE essay readings throughout the year.
given score will consistently represent the same proficiency
TWE Committee members are rotated on a regular basis
level across test administrations.
to ensure the continued introduction of new ideas and
In September 1989 the TWE Scoring Guide was revised
perspectives related to the assessment of English writing.
by a committee of TWE essay reading managers who were
Appendix A lists current and former committee members.
asked to refine it while maintaining the comparability of
scores assigned at previous TWE essay readings. The revisions
Test Specifications
were based on feedback from TWE essay readers, essay
Test specifications outline what a test purports to measure
reading managers, and the TWE Committee.
and how it measures the identified skills. The purpose of
The primary purpose of the revision was to make the
TWE is to give examinees whose native language is not
guide a more easily internalized tool for scoring TWE essays
English an opportunity to demonstrate their ability to express
during a reading. After completing the revisions, the committee
ideas in acceptable written English in response to an assigned
of essay reading managers rescored essays from the first
topic. Topics are designed to be fair, accessible, and
TWE administration to see that no shift in scoring occurred.
appropriate to all members of the international TOEFL
The revised scoring guide was reviewed, used to score
population. Each essay is judged according to lexical and
pretest essays, and approved by the TWE Committee in
syntactic standards of English and the effectiveness with
February 1990. It was introduced at the March 1990 TWE
which the examinee, organizes, develops, and expresses ideas
reading.
2
Copyright © 1996 by Educational Testing Service. All rights reserved.

TWE Essay Questions
TWE Pretesting Procedures
The TWE test requires examinees to produce an essay in
Each potential TWE item or prompt is pretested with
response to a brief question or topic. The writing tasks
international students (both undergraduate and graduate)
presented in TWE topics have been identified by research as
studying in the United States and Canada who represent a
typical of those required for college and university course
variety of native languages and English proficiency levels.
work. The topics and tasks are designed to give examinees
Pretesting is conducted primarily in English language institutes
the opportunity to develop and organize ideas and to express
and university composition courses for nonnative speakers of
those ideas in lexically and syntactically appropriate English.
English.
Because TWE aims to measure composition skills rather
Each pretest item is sent to a number of institutions in
than reading comprehension skills, topics are brief, simply
order to obtain a diverse sample of examinees and essays.
worded, and not based on reading passages. Samples of
The pretest sites are chosen on the basis of geographic location,
TWE essay questions used in past administrations are included
type of institution, foreign student population, and English
in Appendix D.
language proficiency levels of the students at the site. The
TWE questions are developed in two stages. The TWE
goal is to obtain a population similar to the TOEFL/TWE test
Committee writes, reviews, revises, and approves essay topics
population.
for pretesting. In developing topics for pretesting, the
During a pretest administration, writers have 30 minutes
committee considers the following criteria:
to plan and write an essay under standardized testing

procedures similar to those used in operational TWE
the topic (prompt) should be accessible to TOEFL
administrations. The essays received for each item are then
examinees from a variety of linguistic, cultural, and
educational backgrounds
prepared for the TWE Committee to evaluate. When
evaluating pretest essays, the committee is given detailed
• the task to be performed by examinees should be
information on the examinees (native language, undergraduate/
explicitly stated
graduate status, language proficiency test scores, if known)
• the wording of the prompt should be clear and
as well as feedback received on each essay question from
unambiguous
pretest supervisors and examinees.
After a representative sample of pretest essays has been
• the prompt should allow examinees to plan, organize,
obtained, the sample is reviewed by the TWE Committee to
and write their essays in 30 minutes
evaluate the effectiveness of each prompt. An effective prompt
Once approved for pretesting, each TWE question is further
is one that is easily understood by examinees at a range of
reviewed by ETS test developers and sensitivity reviewers to
language proficiencies and that elicits essays that can be
ensure that it is not biased, inflammatory, or misleading, and
validly and consistently scored according to the TWE scoring
that it does not unfairly advantage or disadvantage any
guide. The committee is also concerned that the prompt
subgroup within the TOEFL population.
engage the writers, and that the responses elicited by the
As more is learned about the processes and domains of
prompt be varied and interesting enough to engage readers. If
academic writing, TWE test developers and researchers will
the committee approves a prompt after reading the sample of
explore the use of different kinds of writing topics and tasks
pretest essays, it may be used in an operational TOEFL/TWE
in the TWE test.
test administration.
Copyright © 1996 by Educational Testing Service. All rights reserved.
3

TWE ESSAY READINGS
Reader Qualifications
Small groups of readers work under the direct supervision
Readers for the TWE test are primarily English and ESL
of reading managers, who monitor the performance of each
writing specialists affiliated with accredited colleges,
scorer throughout the reading. Each batch of essays is
universities, and secondary schools in the United States and
scrambled between the first and second readings to ensure
Canada. In order to be invited to serve as a reader, an individual
that readers are not unduly influenced by the sequence of
must have read successfully for at least one other ETS program
essays.
or qualify at a TWE reader training session.
Each essay is scored by two readers working independently.
TWE reader training sessions are conducted as needed.
The score assigned to an essay is derived by averaging the
During these sessions, potential readers receive intensive
two independent ratings or, in the case of a discrepancy of
training in holistic scoring procedures using the TWE Scoring
more than one point, by the adjudication of the score by a
Guide and TWE essays. At the conclusion of the training,
reading manager. For example, if the first reader assigns a
participants independently rate 50 TWE essays that were
score of 5 to an essay and the second reader also assigns it a
scored at an operational reading. To qualify as a TWE rater,
score of 5, 5 is the score reported for that essay. If the first
participants must demonstrate their ability to evaluate TWE
reader assigns a score of 5 and the second reader assigns a
essays reliably and accurately using the TWE Scoring Guide.
score of 4, the two scores are averaged and a score of 4.5 is
reported. However, if the first reader assigns a score of 5 to an
Scoring Procedures
essay and the second reader assigns it a 3, the scores are
All TWE essay readings are conducted in a central location
considered discrepant. In this case, a reading manager scores
under standardized procedures to ensure the accuracy and
the essay to adjudicate the score.
reliability of the essay scores.
Using the scenario above of first and second reader scores
TWE essay reading managers are English or ESL faculty
of 3 and 5, if the reading manager assigns a score of 4, the
who represent the most capable and experienced readers. In
three scores are averaged and a score of 4 is reported. However,
preparation for a TWE scoring session, the essay reading
if the reading manager assigns a score of 5, the discrepant
managers prepare packets of sample essays illustrating the
score of 3 is discarded and a score of 5 is reported. To date,
six points on the scoring guide. Readers score and discuss
more than 2,500,000 TWE essays have been scored, resulting
these sets of sample essays with the essay reading managers
in some 5,000,000 readings. Discrepancy rates for the TWE
prior to and throughout the reading to maintain scoring
readings have been extremely low, usually ranging from 1 to
accuracy.
2 percent per reading.
TWE SCORES
Six levels of writing proficiency are reported for the TWE
Because language proficiency can change considerably in
test. TWE scores range from 6 to 1 (see Appendix B). A score
a relatively short period, the TOEFL office will not report
between two points on the scale (5.5, 4.5, 3.5, 2.5, 1.5) can
TWE scores that are more than two years old. Therefore,
also be reported (see “Scoring Procedures” above). The
individually identifiable TWE scores are retained in a database
following codes and explanations may also appear on TWE
for only two years from the date of the test. After two years,
score reports:
information that could be used to identify an individual is
removed from the database. Information such as score data
1NR
Examinee did not write an essay.
and essays that may be used for research or statistical purposes
OFF
Examinee did not write on the assigned topic.
may be retained indefinitely; however, this information does
*
TWE not offered on this test date.
not include any individual examinee identification.
**
TWE score not available.
4
Copyright © 1996 by Educational Testing Service. All rights reserved.

TWE scores and all information that could identify an
examinee submits a TWE score to an institution or agency
examinee are strictly confidential. An examinee's official
and there is a discrepancy between that score and the official
TWE score report will be sent only to those institutions or
TWE score recorded at ETS, ETS will report the official
agencies designated by the examinee on the answer sheet on
score to the institution or agency. Examinees are advised of
the day of the test, or on a Score Report Request Form
this policy in the Bulletin of Information for TOEFL, TWE,
submitted by the examinee at a later date, or by other written
and TSE.
authorization from the examinee.
A TWE rescoring service is available to examinees who
Examinees receive their test results on a form titled
would like to have their essays rescored. Further information
Examinee’s Score Record. These are not official TOEFL
on this rescoring process can also be found in the Bulletin of
score reports and should not be accepted by institutions. If an
Information for TOEFL, TWE, and TSE.
GUIDELINES FOR USING TWE TEST SCORES
An institution that uses TWE scores should consider certain
4. Consider that examinee scores are based on a single 30-
factors in evaluating an individual’s performance on the test
minute essay that represents a first-draft writing sample.
and in determining appropriate TWE score requirements.
The following guidelines are presented to assist institutions
5. Use the TWE Scoring Guide and writing samples
in arriving at reasonable decisions.
illustrating the guide as a basis for score interpretation
(see Appendix B and E). Score users should bear in mind
1. Use the TWE score as an indication of English writing
that a TWE score level represents a range of proficiency
proficiency only and in conjunction with other indicators
and is not a fixed point.
of language proficiency, such as TOEFL section and total
scores. Do not use the TWE score to predict academic
6. Avoid decisions based on small score differences. Small
performance.
score differences (i.e., differences less than approximately
two times the standard error of measurement) should not
2. Base the evaluation of an applicant’s readiness to begin
be used to make distinctions among examinees. Based
academic work on all available relevant information and
upon the average standard error of measurement for the
recognize that the TWE score is only one indicator of
past 10 TWE administrations, distinctions among
academic readiness. The TWE test provides information
individual examinees should not be made unless their
about an applicant’s ability to compose academic English.
TWE scores are at least one point apart.
Like TOEFL, TWE is not designed to provide information
about scholastic aptitude, motivation, language learning
7. Conduct a local validity study to assure that the TWE
aptitude, field specific knowledge, or cultural adaptability.
scores required by the institution are appropriate.
3. Consider the kinds and levels of English writing proficiency
As part of its general responsibility for the tests it produces,
required at different levels of study in different academic
the TOEFL program is concerned about the interpretation
disciplines. Also consider the resources available at the
and use of TWE test scores by recipient institutions. The
institution for improving the English writing proficiency
TOEFL office encourages individual institutions to request
of students for whom English is not the native language.
its assistance with any questions related to the proper use of
TWE scores.
Copyright © 1996 by Educational Testing Service. All rights reserved.
5

STATISTICAL CHARACTERISTICS OF THE TWE TEST
Reliability
second measure reported is coefficient alpha, which provides
The reliability of a test is the extent to which it yields
an estimate of the internal consistency of the final scores
consistent results. A test is considered reliable if it yields
based upon two readers per essay. Because each reported
similar scores across different forms of the test, different
TWE score is the average of two separate ratings, the reported
administrations, and, in the case of subjectively scored
TWE scores are more reliable than the individual ratings.
measures, different raters.
Therefore, coefficient alpha is generally higher than the simple
There are several ways to estimate the reliability of a test,
correlation between readers, except in those cases where the
each focusing on a different source of measurement error.
correlation is equal to 0 or 1. (If there were perfect agreement
The reliability of the TWE test has been evaluated by
on each essay across all raters, coefficient alpha would equal
examining interrater reliability, that is, the extent to which
1.0; if there were no relationship between the scores given by
readers agree on the ratings assigned to each essay. To date, it
different raters, coefficient alpha would be 0.0.)
has not been feasible to assess alternate-form and test-retest
Table 1 contains summary statistics and interrater reliability
reliability, which focus on variations in test scores that result
statistics for the 10 TWE administrations from August 1993
from changes in the individual or changes in test content
through May 1995. The interrater correlations and coefficients
from one testing situation to another. To do so, it would be
alpha indicate that reader reliability is acceptably high, with
necessary to give a relatively large random sample of
correlations between first and second readers ranging from
examinees two different forms of the test (alternate-form
.77 to .81, and the values for coefficient alpha ranging from
reliability) or the same test on two different occasions (test-
.87 to .89.
retest reliability). However, the test development procedures
Table 1 also shows the reader discrepancy rate for each of
that are employed to ensure TWE content validity (discussed
the 10 TWE administrations. This value is simply the
later in this section) would be expected to contribute to
proportion of essays for which the scores of the two readers
alternate-form reliability.
differed by two or more points. These discrepancy rates are
Two measures of interrater reliability are reported for the
quite low, ranging from 0.2 percent to 1.1 percent. (Because
TWE test. The first measure reported is the Pearson product-
all essays with ratings that differed by two or more points
moment correlation between first and second readers, which
were given a third reading, the discrepancy rates also reflect
reflects the overall agreement (across all examinees and all
the proportions of essays that received a third reading.)
raters) of the pairs of readers who scored each essay. The
Table 1
Reader Reliabilities
(Based on scores assigned to 606,883 essays in the 10 TWE administrations from August 1993 through May 1995)
Correlation
SEM2
Admin.
TWE
TWE
Discrepancy
1st & 2nd
Indiv.
Score
Date
N
Mean
S.D.
Rate1
Readers
Alpha
Scores
Diffs.
Aug. 1993
56,240
3.66
0.84
.011
.780
.876
.30
.42
Sept. 1993
27,951
3.69
0.78
.004
.788
.881
.27
.38
Oct. 1993
87,616
3.68
0.85
.010
.782
.877
.30
.42
Feb. 1994
48,694
3.65
0.89
.010
.799
.888
.30
.42
May 1994
74,972
3.73
0.83
.010
.767
.868
.30
.43
Aug. 1994
56,553
3.66
0.80
.007
.770
.870
.29
.41
Sept. 1994
28,282
3.71
0.78
.002
.807
.893
.26
.36
Oct. 1994
89,656
3.72
0.84
.009
.783
.878
.29
.41
Feb. 1995
54,783
3.65
0.84
.010
.777
.874
.30
.42
May 1995
82,136
3.65
0.84
.009
.777
.875
.30
.42
1
Proportion of papers in which the two readers differed by two or
2 Standard errors of measurement listed here are based upon the
more points. (When readers differed by two or more points, the
extent of interrater agreement and do not take into account other
essay was adjudicated by a third reader.)
sources of error, such as differences between test forms. Therefore,
these values probably underestimate the actual error of
measurement.
6
Copyright © 1996 by Educational Testing Service. All rights reserved.

Standard Error of Measurement
between the examinees’ true scores; in approximately 80
Any test score is only an estimate of an examinee’s knowledge
percent of all cases, the difference between obtained scores is
or ability, and an examinee’s test score might have been
expected to be within 1.28 standard errors above or below the
somewhat different if the examinee had taken a different
true difference. This information allows the test user to evaluate
version of the test, or if the test had been scored by a different
the probability that individuals with different obtained TWE
group of readers. If it were possible to have someone take all
scores actually differ in their true scores. For example, among
the editions of the test that could ever be made, and have
all pairs of examinees with the same true scores (i.e., with
those tests scored by every reader who could ever score the
true-score differences of zero) in the May 1995 administration,
test, the average score over all those test forms and readers
more than 20 percent would be expected to obtain TWE
presumably would be a completely accurate measure of the
scores that differ from one another by one-half point or more;
examinee’s knowledge or ability. This hypothetical score is
however, fewer than 5 percent (in fact, only about 1.7 percent)
often referred to as the “true score.” Any difference between
would be expected to obtain TWE scores more than one point
this true score and the score that is actually obtained on a
apart.
given test is considered to be measurement error.
Because an examinee’s hypothetical true score on a test is
obviously unknown, it is impossible to know exactly how
Validity
large the measurement error is for any individual examinee.
Beyond being reliable, a test should be valid; that is, it should
However, it is possible statistically to estimate the average
actually measure what it is intended to measure. It is generally
measurement error for a large group of examinees, based
recognized that validity refers to the usefulness of inferences
upon the test’s standard deviation and reliability. This statistic
made from a test score. The process of validation is necessarily
is called the Standard Error of Measurement (SEM).
an ongoing one, especially in the area of written composition,
The last two columns in Table 1 show the standard errors
where theorists and researchers are still in the process of
of measurement for individual scores and for score differences
defining the construct.
on the TWE test. The standard errors of measurement that are
To support the inferences made from test scores, validation
reported here are estimates of the average differences between
should include several types of evidence. The nature of that
obtained scores and the theoretical true scores that would
evidence should depend upon the uses to be made of the test.
have been obtained if each examinee’s performance on a
The TWE test is used to make inferences about an examinee’s
single test form had been scored by all possible readers. For
ability to compose academically appropriate written English.
the 10 test administrations shown in the table, the average
Two types of validity evidence are available for the TWE
standard error of measurement was approximately .29 for
test: (1) construct-related evidence and (2) content-related
individual scores and .41 for score differences.
evidence. Construct-related evidence refers to the extent to
The standard error of measurement can be helpful in the
which the test actually measures the particular construct of
interpretation of test scores. Approximately 95 percent of all
interest, in this case, English-language writing ability. Content-
examinees are expected to obtain scores within 1.96 standard
related evidence refers to the extent to which the test provides
errors of measurement from their true scores and
an adequate and representative sample of the particular content
approximately 90 percent are expected to obtain scores within
domain that the test is designed to measure.
1.64 standard errors of measurement. For example, in the
Construct-related Evidence. One source of construct-
May 1995 administration (with SEM = .30), less than 10
related evidence for the validity of the TWE test is the
percent of examinees with true scores of 3.0 would be
relationship between TWE scores and TOEFL scaled scores.
expected to obtain TWE scores lower than 2.5 or higher than
Research suggests that skills such as those intended to be
3.5; of those examinees with true scores of 4.0, less than 10
measured by both the TOEFL and TWE tests are part of a
percent would be expected to obtain TWE scores lower than
more general construct of English language proficiency (Oller,
3.5 or higher than 4.5.
1979). Therefore, in general, examinees who demonstrate
When the scores of two examinees are compared, the
high ability on TOEFL would not be expected to perform
difference between the scores will be affected by errors of
poorly on TWE, and examinees who perform poorly on
measurement in each of the scores. Thus, the standard errors
TOEFL would not be expected to perform well on TWE.
of measurement for score differences are larger than the
This expectation is supported by the data collected over
corresponding standard errors of measurement for individual
several TWE administrations. Table 2 displays the frequency
scores (about 1.4 times as large). In approximately 95 percent
distributions of TWE scores for five different TOEFL score
of all cases, the difference between obtained scores is expected
ranges over 10 administrations.
to be within 1.96 standard errors above or below the difference
Copyright © 1996 by Educational Testing Service. All rights reserved.
7

Table 2
Frequency Distribution of TWE Scores for TOEFL Total Scaled Scores
(Based on 607,350 examinees who took the TWE test from August 1993 through May 1995)
TOEFL Scores
TOEFL Scores TOEFL Scores
TOEFL Scores
Between 477
Between 527 Between 577 TOEFL Scores
Below 477
and 523
and 573
and 623
Above 623
TWE Score
N
Percent
N Percent
N Percent
N
Percent
N
Percent
6.0
5
0.0+
55
0.04
402
0.23
1,703
1.54
4,338
10.36
5.5
27
0.02
205
0.13
1,224
0.71
3,612
3.27
5,190
12.40
5.0
564
0.43
2,949
1.94
10,962
6.36
19,415
17.57
13,276
31.71
4.5
1,634
1.25
6,695
4.39
16,877
9.80
18,783
17.00
7,275
17.38
4.0
20,429
15.68
50,451
33.10
75,860
44.03
47,286
42.79
9,594
22.92
3.5
18,910
14.51
29,066
19.07
28,956
16.81
10,951
9.91
1,383
3.30
3.0
49,948
38.34
47,702
31.30
31,838
18.48
7,804
7.06
721
1.72
2.5
17,161
13.17
9,203
6.04
4,096
2.38
685
0.62
57
0.14
2.0
15,771
12.11
5,182
3.40
1,785
1.04
228
0.21
27
0.06
1.5
2,979
2.29
518
0.34
165
0.10
23
0.02
2
0.0+
1.0
2,857
2.19
372
0.24
118
0.07
30
0.03
1
0.0+
As the data in Table 2 indicate, across the 10 TWE
appropriate construct, but low enough to support the
administrations from August 1993 through May 1995 it was
conclusion that the test also measures abilities that are distinct
rare for examinees to obtain either very high scores on the
from those measured by TOEFL. The extent to which TWE
TOEFL test and low scores on the TWE test or very low
scores are independent of TOEFL scores is an indication of
scores on TOEFL and high scores on TWE. It should be
the extent to which the TWE test measures a distinct skill or
pointed out, however, that the data in Table 2 do not suggest
skills.
that TOEFL scores should be used as predictors of TWE
Table 3 presents the correlations of TWE scores with
scores.
TOEFL scaled scores for examinees within each of the three
Although there are theoretical grounds for expecting a
geographic regions in which TWE was administered at the 10
positive relationship between TOEFL and TWE scores, there
administrations. The correlations between the TOEFL total
would be no point in administering the TWE test to examinees
scores and TWE scores range from .57 to .68, suggesting that
if it did not measure an aspect of English language proficiency
the productive writing abilities assessed by TWE are somewhat
distinct from what is already measured by TOEFL. Thus, the
distinct from the proficiency skills measured by the multiple-
correlations between TWE scores and TOEFL scaled scores
choice items of the TOEFL test.
should be high enough to suggest that TWE is measuring the
8
Copyright © 1996 by Educational Testing Service. All rights reserved.

Table 3
Correlations between TOEFL and TWE Scores1
(Based on 606,883 examinees who took the TWE test from August 1993 through May 1995)
TOEFL
Geographic
Total
Section 1
Section 2
Section 3
Admin. Date
Region2
N
r
r
r
r
Aug. 19933
Region 1
27,807
.64
.66
.58
.57
Region 2
12,072
.68
.66
.65
.62
Region 3
16,361
.62
.60
.60
.57
Sept. 19933
Region 1
6,662
.65
.66
.63
.53
Region 2
10,961
.64
.62
.62
.59
Region 3
10,328
.59
.55
.58
.53
Oct. 19933
Region 1
41,638
.66
.65
.62
.62
Region 2
16,288
.67
.65
.66
.60
Region 3
29,690
.64
.63
.63
.58
Feb. 1994
Region 1
16,555
.65
.65
.59
.60
Region 2
11,305
.60
.54
.60
.56
Region 3
20,834
.61
.59
.58
.56
May 1994
Region 1
35,290
.60
.62
.55
.54
Region 2
14,239
.59
.53
.59
.51
Region 3
25,443
.64
.61
.62
.57
Aug. 1994
Region 1
36,137
.63
.64
.59
.54
Region 2
4,010
.64
.56
.66
.60
Region 3
16,406
.62
.58
.60
.54
Sept. 1994
Region 1
14,436
.62
.64
.57
.55
Region 2
3,623
.66
.62
.66
.61
Region 3
10,223
.57
.55
.55
.51
Oct. 1994
Region 1
48,628
.68
.68
.63
.62
Region 2
10,289
.58
.52
.58
.54
Region 3
30,739
.62
.58
.59
.58
Feb. 1995
Region 1
22,102
.65
.64
.60
.59
Region 2
11,562
.61
.52
.64
.56
Region 3
21,119
.59
.55
.57
.54
May 1995
Region 1
43,450
.65
.65
.62
.59
Region 2
13,825
.64
.57
.66
.56
Region 3
24,861
.63
.58
.62
.56
1
Correlations have been corrected for unreliability of TOEFL scores.
2 Geographic Region 1 includes Asia, the Pacific (including Australia), and Israel; Geographic Region 2 includes Africa, the Middle East, and
Europe; Geographic Region 3 includes North America, South America, and Central America.
F
3
or these administrations, some examinees from test centers in Asia are included in Region 2 and/or Region 3.
Table 3 also shows the correlations of TWE scores with
with TOEFL Section 1 scores than with Section 2 scores in
each of the three TOEFL section scores. Construct validity
all 10 administrations. These correlations are consistent with
would be supported by higher correlations of TWE scores
those found by Way (1990), who noted that correlations
with TOEFL Section 2 (Structure and Written Expression)
between TWE scores and TOEFL Section 2 scores were
than with Section 1 (Listening Comprehension) or Section 3
generally lower for examinees from selected Asian language
(Reading Comprehension) scores. In fact, this pattern is
groups than for other examinees.
generally found in TWE administrations for Regions 2 and 3.
Content-related Evidence. As a test of the ability to
In Region 1, however, TWE scores correlated more highly
compose in standard written English, TWE uses writing
Copyright © 1996 by Educational Testing Service. All rights reserved.
9

tasks similar to those required of college and university
Table 5 lists the mean TWE scores for examinees tested at
students in North America. As noted earlier, the TWE
the 10 administrations, classified by native language. Table 6
Committee develops items/prompts to meet detailed
lists the mean TWE scores for examinees classified by native
specifications that encompass widely recognized components
country. These tables may be useful in comparing the test
of written language facility. Thus, each TWE item is
performance of a particular student with the average
constructed by subject-matter experts to assess the various
performance of other examinees who are from the same
factors that are generally considered crucial components of
country or who speak the same native language.
written academic English. Each item is pretested, and results
It is important to point out that the data do not permit any
of each pretested item are evaluated by the TWE Committee
generalizations about differences in the English writing
to ensure that the item is performing as anticipated. Items that
proficiency of the various national and language groups. The
do not perform adequately in a pretest are not used for the
tables are based simply on the performance of those examinees
TWE test.
who have taken the TWE test. Because different selective
Finally, the actual scoring of TWE essays is done by
factors may operate in different parts of the world to determine
qualified readers who have experience teaching English
who takes the test, the samples on which the tables are based
writing to native and nonnative speakers of English. The
are not necessarily representative of the student populations
TWE readers are guided in their ratings by the TWE Scoring
from which the samples came. In some countries, for example,
Guide and the standardized training and scoring procedures
virtually any high school, university, or graduate student who
used at each TWE essay reading.
aspires to study in North America may take the test. In other
countries, government regulations permit only graduate
Performance of TWE Reference Groups
students in particular areas of specialization, depending on
Table 4 presents the overall frequency distribution of TWE
national interests, to do so.
scores based on the 10 administrations from August 1993
through May 1995.
Table 4
Frequency Distribution of TWE Scores for All Examinees
(Based on 607,350 examinees who took the TWE test from August 1993 through May 1995)
TWE Score
N
Percent
Percentile Rank
6.0
6,503
1.07
99.47
5.5
10,258
1.69
98.09
5.0
47,166
7.77
93.36
4.5
51,264
8.44
85.25
4.0
203,620
33.53
64.28
3.5
89,266
14.70
40.16
3.0
138,013
22.72
21.45
2.5
31,202
5.14
7.52
2.0
22,993
3.79
3.06
1.5
3,687
0.61
0.87
1.0
3,378
0.56
0.28
10
Copyright © 1996 by Educational Testing Service. All rights reserved.

Download
TOEFL TEST OF WRITTEN ENGLISH GUIDE

 

 

Your download will begin in a moment.
If it doesn't, click here to try again.

Share TOEFL TEST OF WRITTEN ENGLISH GUIDE to:

Insert your wordpress URL:

example:

http://myblog.wordpress.com/
or
http://myblog.com/

Share TOEFL TEST OF WRITTEN ENGLISH GUIDE as:

From:

To:

Share TOEFL TEST OF WRITTEN ENGLISH GUIDE.

Enter two words as shown below. If you cannot read the words, click the refresh icon.

loading

Share TOEFL TEST OF WRITTEN ENGLISH GUIDE as:

Copy html code above and paste to your web page.

loading