• No se han encontrado resultados

LA LEGISLACIÓN SOBRE INMIGRACIÓN Y EXTRANJERÍA

Marcin Oleksy, Jan Koco´n, Wroclaw University of Technology

Maciej Maciej, Institute of Literary Research, Polish Academy of Sciences Maciej Piasecki, Wroclaw University of Technology

Weblog, often called an “electronic journal”, is a simple website containing posts displayed in the reversed chronological order, which can usually be commented by other users (cf. McNeil 2003). It is not a simple recreation of a journal in the digital environment but rather a complex remediation of traditional genres (e.g. journal, diary, letter) under the influence of elec- tronic discourse, which entails such phenomena as secondary orality and interaction with the user (cf. Davis and Brewer 1997).

Although blogging is a relatively new phenomenon and the scholarly inter- est in this form of writing could be traced back to the first years of the 21st century, there is already a substantial literature dedicated to the genre analysis of blogs. We may distinguish two main threads of scholarship on this field: one focusing on the relationships between blogs and the other forms of Computer-Mediated Communication (e.g. Herring et al. 2005; De- vitt 2009), and the other tracing the „ancestry” of the genre in such forms as journals (e.g. Serfaty 2004; Miller and Shepherd 2009).

The aim of this study is the linguistic analysis of weblog genres, which would support earlier interpretative studies of the collected material (e.g. Maryl and Niewiadomski 2014). In this study we analyse the corpus of 260 popu- lar Polish blogs. Construction of the blog corpus is not straightforward as blogs can very differ in the way of using HTML structures for presenting the textual content. In order to avoid laborious manual editing, a system called BlogReader was constructed. It is dedicated to corpus acquisition from structured web sources. It was built as an expansion of a corpus gath- ering system for based on several open components, e.g. jusText and Onnion (Pomikálek, 2011). BlogReader works in a semi-automatic way: users first specify elements of the HTML structure that include text fragments that are relevant for the users. Identified fragments are next automatically acquired and saved in the selected corpus format.

The analysis concerns three aspects:

• Linguistic comparison of blogs and other forms of discourse (e.g. lit- erature, news)

• Linguistic analysis of weblog genres, i.e. linguistic differences be- tween particular groups of blogs and authors

• Linguistic variation between blog posts within individual blogs Blog corpus analysis will be based on an open stylometrics system dedicated for Polish which is built as a part of the CLARIN-PL project (www.clarin- pl.eu). The system will enable description of text documents with features referring to any level of the linguistic structure: from the level of word forms up to the level of the semantic-pragmatic structures. The system will

combine several existing components: language tools for pre-processing, Fextor (Broda et al., 2013) – a system for defining features in a flexible way, Stylo (http://crantastic.org/packages/stylo/versions/34587) - a stylometric package for English, SuperMatrix (Broda & Piasecki, 2013) – a system for building and processing very large co-incidence matrices (e.g. documents vs features) with interface to clustering and Machine Learning packages. With the help of the system we want to look for different groups of blogs and also for features (e.g. lemmas or lexico-syntactic patterns) that charac- terize different text groups.

This paper is a part of the ongoing research project “Blog as a new form of electronic writing” (2012-2014) funded by the National Science Center in Poland and conducted in the Digital Humanitites Centre at the Institute of Literary Research of the Polish Academy of Sciences. This work is also partially financed as part of the investment in the CLARIN-PL research in- frastructure funded by the Polish Ministry of Science and Higher Educa- tion

References

[1] Broda, Bartosz, Paweł K˛edzia, Michał Marci´nczuk, Adam Radziszewski, Ra- dosław Ramocki, Adam Wardy´nski (2013). Fextor: A Feature Extraction Frame- work for Natural Language Processing: A Case Study in Word Sense Disam- biguation, Relation Recognition and Anaphora Resolution. In A. Przepiórkowski et al., Computational Linguistics Applications, 41-62. Springer Berlin Heidel- berg.

[2] Broda, Bartosz & Maciej Piasecki (2013) Parallel, Massive Processing in Super- Matrix – a General Tool for Distributional Semantic Analysis of Corpora Interna- tional Journal of Data Mining, Modelling and Management, 2013, 5, 1-19 [3] Davis, Boyd H. and Jeutonne P. Brewer (1997) Electronic discourse: linguistic

individuals in virtual space, Albany : State University of New York Press. [4] Devitt, Amy J. (2009) “Re-fusing form in genre study” Genres in the Internet.

Issues in the theory of genre, Eds. Janet Giltrow and Dieter Stein, Amsterdam: John Benjamins.

[5] Herring, Susan, Lois Ann Shedit, Elijah Writh and Sabrina Bonus (2005) „We- blogs as a bridging genre” Information, Technology & People nr 2 (18).

[6] McNeill, Laurie (2003) “Teaching an Old Genre New Tricks: The Diary on the Internet” Biography, 26:1, Winter 2003.

[7] Jamieson, Kathleen Hall and Karlyn Kohrs Campbell (1982) „Rhetorical Hybrids: Fusions of Generic Elements” Quarterly Journal of Speech nr 69.

[8] Maryl, Maciej and Krzysztof Niewiadomski, „Blog = Ksi ˛a˙zka? Empiryczne badanie potocznej kategoryzacji blogów” Przegl ˛ad Humanistyczny 2013:4. [9] Miller, Carolyn R. and Dawn Shepherd (2009) “Questions for genre theory from

the blogosphere” Genres in the Internet. Issues in the theory of genre, Eds. Janet Giltrow and Dieter Stein, Amsterdam: John Benjamins.

[10] Miller, Carolyn R. and Dawn Shepherd (2004) “Blogging as Social Action: A Genre Analysis of the Weblog” Into the blogosphere: Rhetoric, community, and the culture of weblogs, Ed. Laura Gurak, Smiljana Antonijevic, Laurie Johnson,

Clancy Ratliff and Jessica Reymann. Minneapolis: University of Minnesota Li- braries.

[11] Morrison, Aimeée (2008) “Blogs and Blogging: Text and Practice” A Companion to Digital Literary Studies, Ed. Ray Siemens and Susan Schreibman, Oxford: Blackwell, 2008.

[12] Pomikálek, J.: Removing Boilerplate and Duplicate Content fromWeb Corpora. Ph.D. thesis,

[13] Masaryk University, Brno (2011)

[14] Serfaty ,Viviane 2004, “Online Diaries : Towards a Structural Approach” Jour- nal of American Studies, 3(38).

[15] Sudweeks, Fay i Simeon. J. Simoff (1999) „Complementary Explorative Data Analysis. The Reconciliation of Quantitative and Qualitative Principles” Doing Internet Research: Critical Issues and Methods for Examining the Net, Ed. Steve Jones, Thousand Oaks: Sage publications.

Documento similar