• No se han encontrado resultados

Discusión de los resultados

4. Comportamiento ante la actividad físico-deportiva en el tiempo libre.

4.3 Las Etapas de cambio respecto a la actividad física.

With the increasing need for English speaking ability in the Chinese context, spoken English testing is gradually gaining in importance. However, of all the English speaking tests in China, the vast majority are under-developed and their tasks insufficiently justified. If China were to advance in its spoken English language testing, and potentially, the capacity of the English learner population to speak English fluently, systematic validation exercises should be carried out on existing English speaking tests to give insights into test design, strengthen links with the associated curricula and improve the potential for positive washback.

The present study represents an initial step in the validation process of the TEM4-Oral retelling task. Through investigating discourse features of test- taker performance, it aims to collect validity evidence of the task in two aspects: (1) the consistency of task design and rating across test versions; (2) the correspondence between the descriptions in the marking criteria and the language features of the candidates. For the consistency of task design, the analysis of test-taker discourse suggests differences in the difficulty levels of the source texts across four test administrations. As for rating, in one of the test administrations, higher- and lower-ranked retellings were not distinguished as well as in the other years. In addition, discourse features of the retellings have been found to reflect the general emphasis of the marking criteria and differ between scoring levels. In a validity argument of the TEM4-Oral retelling task,

these findings suggest issues regarding the generalisation inference as well as provide backing for the explanation inference.

Besides filling in evidence for a validity argument, the explorative nature of the study has also shed light on test design. Across administrations, the characteristics of the input text and task specifications are reflected in the various discourse features of test-taker performance. Aside from the content of the source stories, prosodic features such as intonation and pausing when the stories were read to the candidates have also been found to influence test-takers’ production of retellings. In addition, the task instructions along with the fast pace in which the stories were read have constrained the extent of candidates’ manipulation of the story content, resulting in very little processing or integration. These features of test-taker performance serve as a reminder of the complex dynamics involved in designing a test task. For test developers, understanding aspects of the test and staying constantly aware of their potential impacts on performance will assist in constructing effective and reliable tests (Fulcher & Davison, 2007).

The effort to validate the TEM4-Oral retelling task reveals challenges in carrying out argument-based validation on less popular tests in the presence of under-specified test components. At present, the majority of systematic validation studies are carried out on large-scale, high-stakes, standardised English proficiency tests such as TOEFL and the International English Language Testing System (IELTS) which attract massive numbers of test-takers (for IELTS, around 2,000,000 tests are taken each year, compared to 20,000 for TEM4-Oral; British Council, 2013). Given the intellectual and financial

resources required for test validation, it is not surprising that the more well- recognised a test is, the more comprehensive validation it is likely to receive. This creates a dilemma in the current Chinese context in which there is an urgent need for English speaking tests to undergo validation, while most of those tests have not attracted enough attention. In the meantime, in spoken English tests in China, the intended construct is often not specified, which is problematic for carrying out argument-based validation.

The usefulness of a corpus-based approach to test-taker discourse, as this study exemplifies, provides solutions to the above issues. Various software has enabled corpus analysis to encompass a wide range of discourse features, such as different lengths of language, meaning and structure (Cushing, 2017). In cases where the construct of the task is not clearly defined, an exploration of different discourse features across scoring levels gives insights into what is assessed, which could in turn be used for construct definition and the development of marking criteria. Additionally, in tasks such as the TEM4-Oral retelling task where candidates’ performances are closely related to the input text, analysing test-taker discourse across task versions can also be revealing about the consistency of task materials. On practical grounds, with the extensive use of computers and the emerging freeware, corpus-based discourse analysis appears ideal as a starting point for validation studies.

Nevertheless, test validation is a complex process, and test-taker discourse in relation to the marking criteria is but one aspect of the overall validity argument. Future investigations of the TEM4-Oral retelling task could look into more discourse features commonly indicative of language proficiency,

such as grammatical accuracy and syntactic complexity. They could also make more use of the recordings of the retellings in the corpus, since intonation and pronunciation are of great importance in a speaking test. Also, candidate interviews could be conducted so as to gain an understanding of the nature of the task from the test-takers’ perspective. Other sources of validity evidence include raters’ training procedures and verbal protocols, and information related to the decision-making process based on the scoring of the test. Investigations into each of these aspects are one step towards ensuring the validity of the inferences based on scores from the TEM4-Oral retelling task.

For spoken English language testing in China, each step taken in test validation is much needed. As Bachman (2010) outlined in the foreword of the book English Language Assessment and the Chinese Learner (Cheng & Curtis, 2010), the long testing tradition in China has fostered a value among Chinese people to respect examinations and accept test results and the subsequent decision-making without question. Test validation fills a gap in language testing research in China, reminds test developers to refine the test, and guards against unfairness for thousands of test-takers. It is hoped that validation efforts such as that presented in this study provide an impetus for the reconsideration of speaking test design in relation to their intended effect on the English language learning of Chinese university students.

References

Adamson, B. (2004). China’s English: A History of English in Chinese Education. Hong Kong: Hong Kong University Press.

Anastasi, A. (1986). Evolving concepts of test validation. Annual Review of Psychology, 37(1), 1-15.

Anthony, L. (2004). AntConc: A learner and classroom friendly, multi- platform corpus analysis toolkit. Paper presented at the IWLeL 2004: An Interactive Workshop on Language e-Learning, Waseda University, Tokyo.

Anthony, L. (2016). AntConc (Version 3.4.4) [Computer Software]. Tokyo, Japan: Waseda University.

Bachman, L. F. (1990). Fundamental Considerations in Language Testing. Oxford: Oxford University Press.

Bachman, L. F. (2010). Foreword. In L. Cheng & A. Curtis (Eds.), English Language Assessment and the Chinese Learner (pp. x-xii). New York, NY: Routledge.

Bachman, L. F., & Palmer, A. S. (1996). Language Testing in Practice: Designing and Developing Useful Language Tests. Oxford: Oxford University Press.

Bachman, L. F., & Purpura, J. E. (2008). Language assessments: Gate-keepers or door-openers? In B. Spolsky & F. M. Hult (Eds.), The Handbook of Educational Linguistics (pp. 456-468). Oxford: Blackwell.

Bernstein, J., Van Moere, A., & Cheng, J. (2010). Validating automated speaking tests. Language Testing, 27(3), 355-377.

Blank, M., & Frank, S. M. (1971). Story recall in kindergarten children: Effect of method of presentation on psycholinguistic performance. Child Development, 42(1), 299-312.

Bley-Vroman, R., & Chaudron, C. (1994). Elicited imitation as a measure of second-language competence. In A. D. Cohen, E. Tarone, & S. M. Gass (Eds.), Research Methodology in Second-Language Acquisition (pp. 245-261). Hillsdale, NJ: L. Erlbaum Associates.

Boers, F., Eyckmans, J., Kappel, J., Stengers, H., & Demecheleer, M. (2006). Formulaic sequences and perceived oral proficiency: Putting a Lexical Approach to the test. Language Teaching Research, 10(3), 245-261. Boersma, P., & Weenink, D. (2017). Praat: Doing phonetics by computer

[Computer program]. Version 6.0.30. Retrieved 23 Jul 2017 from http://www.praat.org/

Bolton, K., & Graddol, D. (2012). English in China today. English Today, 28(3), 3-8.

Bondi, M. (2010). Perspectives on keywords and keyness: An introduction. In M. Bondi & M. Scott (Eds.), Keyness in Texts (pp. 1-18).

Amsterdam/Philadelphia: John Benjamins.

Bondi, M., & Scott, M. (2010). Keyness in Texts (Vol. 41). Amsterdam/Philadelphia: John Benjamins.

Bower, G. H. (1976). Experiments on story understanding and recall. The Quarterly Journal of Experimental Psychology, 28(4), 511-534.

Brennan, R. (2001). Generalizability Theory. New York, NY: Springer-Verlag. Brennan, R. L. (2006). Educational Measurement (4 ed.). Westport, CT:

Greenwood Publishing.

Brezina, V., McEnery, T., & Wattam, S. (2015). Collocations in context: A new perspective on collocation networks. International Journal of Corpus Linguistics, 20(2), 139-173.

British Council. (2013, May 28). Record two million IELTS tests taken in the last 12 months. Retrieved 30 September 2017 from

https://www.britishcouncil.org/organisation/press/record-two-million- ielts-tests

Brown, H., & Cambourne, B. (1987). Read and Retell: A Strategy for the Whole-Language/Natural Learning Classroom. North Ryde: Methuen Australia.

Carroll, B. J. (1980). Testing Communicative Performance: An Interim Study

(1st ed.). New York, NY: Pergamon Press.

Chafe, W. L. (1980). The deployment of consciousness in the production of a narrative. In W. L. Chafe (Ed.), The Pear Stories: Cognitive, Cultural and Linguistic Aspects of Narrative Production. Norwood, NJ: Ablex. Chafe, W. L. (1985). Linguistic differences produced by differences between

speaking and writing. In D. R. Olson, N. Torrance, & A. Hildyard (Eds.), Literacy, Language and Learning: The Nature and

Consequences of Reading and Writing. Cambridge: Cambridge University Press.

Chai, M. (2010). On Gender Misuses of Third-Person Singular Pronouns in Chinese EFL Students’ Oral English. (M.A. Thesis), Anhui University. Chang, Y.-F. (2006). On the use of the immediate recall task as a measure of

second language reading comprehension. Language Testing, 23(4), 520-543.

Chapelle, C. A. (2012). Conceptions of validity. In G. Fulcher & F. Davidson (Eds.), The Routledge Handbook of Language Testing. Oxon:

Routledge.

Chapelle, C. A., Enright, M. K., & Jamieson, J. (2010). Does an argument- based approach to validity make a difference? Educational

Measurement: Issues and Practice, 29(1), 3-13.

Chapelle, C. A., Enright, M. K., & Jamieson, J. M. (2008). Test score interpretation and use. In C. A. Chapelle, M. K. Enright, & J. M. Jamieson (Eds.), Building a Validity Argument for the Test of English as a Foreign Language™. Oxon: Routledge.

Chaudron, C. (2003). Data collection in SLA research. In C. J. Doughty & M. H. Long (Eds.), The Handbook of Second Language Acquisition (pp. 762-828). Malden, MA: Blackwell.

Chen, C., Wang, C., & Hu, Y. (2012). Yingyu zhuanye shuoshiyanjiusheng kouyu shuiping xianzhuang de diaocha yu sikao [A survey on the present situation of the English speaking ability of English major postgraduate students]. Guangxi Jiaoyu, 2012(2), 103-105.

Cheng, L. (2008). The key to success: English language testing in China.

Language Testing, 25(1), 15-37.

Cheng, L., & Curtis, A. (2010). English Language Assessment and the Chinese Learner. New York, NY: Routledge.

China Youth Daily. (2013, September 17). Zhongguo daxue zuiai benke zhuanye paihang: Yingyu zhuanye kaishe zuiduo [Most popular undergraduate majors in China: English major offered by most universities]. Retrieved 7 September 2017 from

http://edu.sina.com.cn/gaokao/2013-09-17/1056395595.shtml Craik, F. I., & Tulving, E. (1975). Depth of processing and the retention of

words in episodic memory. Journal of Experimental Psychology: General, 104(3), 268-294.

Crookes, G. (1990). The utterance, and other basic units for second language discourse analysis. Applied Linguistics, 11(2), 183-199.

Cui, X. (2003). Can he may de yuyi yufa duibi ji jiaoxue qishi. [The semantic and syntactic comparisons of can, may and their implication in foreign- language teaching]. Journal of Taiyuan University of Technology (Social Sciences Edition), 21(4), 94-96.

Cumming, A., Kantor, R., Baba, K., Eouanzoui, K., Erdosy, U., & James, M. (2006). Analysis of discourse features and verification of scoring levels for independent and integrated prototype written tasks for the new TOEFL®. TOEFL Monograph Series MS-30; ETS Research Report No. RM-05-13. Princeton, NJ: Educational Testing Service.

Cumming, A., Kantor, R., Baba, K., Erdosy, U., Eouanzoui, K., & James, M. (2005). Differences in written discourse in independent and integrated prototype tasks for next generation TOEFL. Assessing Writing, 10(1), 5-43.

Cushing, S. T. (2017). Corpus linguistics in language testing research.

Language Testing, 34(4), 441-449.

Danielewicz, J. M. (1984). The interaction between text and context: A study of how adults and children use spoken and written language in four contexts. In A. D. Pellegrini & T. D. Yawkey (Eds.), Advances in Discourse Processes (Vol. 13, pp. 243-260). Norwood, NJ: Ablex. Davies, M. (2008). The Corpus of Contemporary American English (COCA):

520 million words, 1990-present. from http://corpus.byu.edu/coca/ Duan, X. (2011). A Study of TEM4-Oral Validity: A Test-Taking Process

Approach. (Master Thesis), Soochow University.

Ellis, N. (2001). Memory for language. In P. Robinson (Ed.), Cognition and Second Language Instruction (pp. 33-68). Cambridge: Cambridge University Press.

Erlam, R. (2006). Elicited imitation as a measure of L2 implicit knowledge: An empirical validation study. Applied Linguistics, 27(3), 464-491.

Fincher-Kiefer, R., Post, T. A., Greene, T. R., & Voss, J. F. (1988). On the role of prior knowledge and task demands in the processing of text. Journal of Memory and Language, 27(4), 416-428.

Foster, P., Tonkyn, A., & Wigglesworth, G. (2000). Measuring spoken language: A unit for all reasons. Applied Linguistics, 21(3), 354-375.

Frost, K., Elder, C., & Wigglesworth, G. (2012). Investigating the validity of an integrated listening-speaking task: A discourse-based analysis of test takers’ oral performances. Language Testing, 29(3), 345-369.

Fulcher, G. (2010). Practical Language Testing. London: Hodder Education. Fulcher, G., & Davison, F. (2007). Language Testing and Assessment: An

Advanced Resource Book. New York, NY: Routledge.

Gablasova, D., Brezina, V., & McEnery, T. (2017). Collocations in corpus based language learning research: Identifying, comparing, and interpreting the evidence. Language Learning, 67(S1), 155-179. Gambrell, L. B., Pfeiffer, W. R., & Wilson, R. M. (1985). The effects of

retelling upon reading comprehension and recall of text information.

The Journal of Educational Research, 78(4), 216-220.

Gebril, A., & Plakans, L. (2013). Toward a transparent construct of reading-to- write tasks: The interface between discourse features and proficiency.

Language Assessment Quarterly, 10(1), 9-27.

Gerbig, A. (2010). Key words and key phrases in a corpus of travel writing: From early modern English literature to contemporary “blooks”. In M. Bondi & M. Scott (Eds.), Keyness in Texts (pp. 147-168).

Amsterdam/Philadelphia: John Benjamins.

Green, A. (2014). Adapting or developing source material for listening and reading tests. In A. J. Kunnan (Ed.), The Companion to Language Assessment (Vol. 7, pp. 830-846). Chichester: John Wiley & Sons, Inc. Guo, W. (2014). Zhongguo yingyu zhuanye daxuesheng qingtaidongci can de

shiyong qingkuang. [The use of modal verb can by English major students in China]. Journal of Shanxi Coal-mining Administrators College, 27(2), 108-109.

Harris, D. P. (1969). Testing English as a Second Language. New York, NY: McGraw-Hill.

Heaton, J. B. (1975). Writing Language Tests. London: Longman.

Hirai, A., & Koizumi, R. (2009). Development of a practical speaking test with a positive impact on learning using a story retelling technique.

Language Assessment Quarterly, 6(2), 151-167.

Housen, A., & Kuiken, F. (2009). Complexity, accuracy, and fluency in second language acquisition. Applied Linguistics, 30(4), 461-473.

Hu, G. (2002). Recent important developments in secondary English-language teaching in the People's Republic of China. Language, Culture and Curriculum, 15(1), 30-49.

Huan, G. (1986). China’s Open Door Policy, 1978-1984. Journal of International Affairs, 39(2), 1-18.

Hunston, S. (2002). Corpora in Applied Linguistics. Cambridge: Cambridge University Press.

Hunt, K. W. (1965). Grammatical structures written at three grade levels. NCTE Research Report No. 3. Champaign, IL: National Council of Teachers of English.

Hunt, K. W. (1966). Recent measures in syntactic development. Elementary English, 43(7), 732-739.

Hunt, K. W. (1970). Syntactic maturity in schoolchildren and adults.

Monographs of the Society for Research in Child Development, 35(1), 1-67.

Ingram, E. (1968). Attainment and diagnostic testing. In A. Davies (Ed.),

Language Testing Symposium: A Psycholinguistic Approach. Oxford: Oxford University Press.

International House London. (2014, May 30). Number of English language learners keeps on growing. Retrieved 28 Sep 2017 from

http://www.ihlondon.com/news/2014/number-of-english-language- learners-keeps-on-growing/

Ji, Y., Li, X., & Li, L. (2012). Zhongguo daxuesheng zhijie-jianjie yinyu zhuanhuan zhong jufa he yuyi de ERP yanjiu. [An ERP study of the syntax and semantics in Chinese college students’ conversion from direct speech to indirect speech]. Foreign Language Research(136), 46- 53.

Jin, Y., & Fan, J. (2011). Test for English Majors (TEM) in China. Language Testing, 28(4), 589-596.

Joe, A. (1995). Text-based tasks and incidental vocabulary learning. Second Language Research, 11(2), 149-158.

Johns, A. M., & Mayes, P. (1990). An analysis of summary protocols of university ESL students. Applied Linguistics, 11(3), 253-271. Johnston, P. H. (1983). Reading Comprehension Assessment: A Cognitive

Basis. Newark, DE: International Reading Association.

Kai, A. (2008). The effects of retelling on narrative comprehension: Focusing on learners’ L2 proficiency and the importance of text information.

ARELE: Annual Review of English Language Education in Japan, 19, 21-30.

Kane, M. (1992). An argument-based approach to validity. Psychological Bulletin, 112(3), 527-535.

Kane, M. (2001). Current concerns in validity theory. Journal of Educational Measurement, 38(4), 319-342.

Kane, M. (2006). Validation. In R. L. Brennan (Ed.), Educational

Measurement (4 ed., pp. 17-64). Westport, CT: Greenwood Publishing. Kane, M. (2010). Validity and fairness. Language Testing, 27(2), 177-182. Kane, M. (2012). Articulating a validity argument. In G. Fulcher & F.

Davidson (Eds.), The Routledge Handbook of Language Testing (pp. 34-47). New York, NY: Routledge.

Kane, M. (2013a). The argument-based approach to validation. School Psychology Review, 42(4), 448-457.

Kane, M. (2013b). Validating the interpretations and uses of test scores.

Journal of Educational Measurement, 50(1), 1-73.

Kane, M., Crooks, T., & Cohen, A. (1999). Validating measures of

performance. Educational Measurement: Issues and Practice, 18(2), 5- 17.

Kintsch, W., & van Dijk, T. A. (1978). Toward a model of text comprehension and production. Psychological Review, 85(5), 363-394.

Kirkpatrick, R., & Zang, Y. (2011). The negative influences of exam-oriented education on Chinese high school students: Backwash from classroom to child. Language Testing in Asia, 1(3), 36.

Knoch, U., & Chapelle, C. A. (2017). Validation of rating processes within an argument-based framework. Language Testing, 1-23. Advance online publication. doi:10.1177/0265532217710049.

Knoch, U., & Elder, C. (2013). A framework for validating post-entry language assessments (PELAs). Papers in Language Testing and Assessment, 2(2), 48-66.

Knoch, U., Macqueen, S., & O’Hagan, S. (2014). An investigation of the effect of task type on the discourse produced by students at various score levels in the TOEFL iBT® writing test. TOEFL iBT Report No. 23; ETS Research Report No. RR-14-43. Princeton, NJ: Educational Testing Service.

Kroll, B. (1977). Combining ideas in written and spoken English: A look at subordination and coordination. In E. O. Keenan & T. L. Bennett (Eds.), Discourse Across Time and Space (Vol. 5). Los Angeles, CA: University of Southern California.

Lado, R. (1961). Language Testing: The Construction and Use of Foreign Language Tests: A Teacher's Book. London: Longmans, Green and Co Ltd.

Leki, I., & Carson, J. (1997). “Completely different worlds”: EAP and the writing experiences of ESL students in university courses. TESOL Quarterly, 31(1), 39-69.

Li, C. N., & Thompson, S. A. (1989). Mandarin Chinese: A Functional Reference Grammar. Berkeley, CA: University of California Press. Li, M. (2007). Foreign Language Education in Primary Schools in the People's

Republic of China. Current Issues in Language Planning, 8(2), 148- 160.

Li, X. (2010). Yingyu zhuanye siji koushi gushi fushu bufen de xianzhuang yu sikao: Yi laizi mou shifanxueyuan yuliao wei jichu de wenxi. [Present condition and reflections of TEM4-Oral: Based on the data from a normal university]. Journal of Yichun College, 32(10), 123-125.

Liang, F. (2011). Yingyu zhuanye shuoshiyanjiusheng kouyu kechengshezhi de xuqiu diaocha [A need analysis of English speaking classes for English major postgraduate students]. Journal of Xianning University, 31(9), 101-102.

Liao, Y., & Wolff, M. (2011). Mute English – The Latin of China. In N. Qiang & M. Wolff (Eds.), The Lowdown on China’s Higher Education (pp. 70-93). Newcastle upon Tyne: Cambridge Scholars Publishing.