In order to analyze syntax in the three different languages, we had to first decide exactly which sections of the text were going to be analyzed in each language. We decided to tag syntactic elements in the first, final and three random paragraphs in each chapter (the random numbers were paragraphs 17, 52 and 87).
Our method to tag syntactic elements was to go through each of the sentences in each paragraph and tag what we called the microcomponents of syntax. If you imagine a very basic syntax tree, it starts as a CP (complementizer phrase) which contains S (sentence) which then contains NP (noun phrase) and VP (verb phrase). NPs and VPs then contain N (noun) and V (verb) respectively. We considered the components at the bottom of the tree to be the microcomponents, thus elements contained within phrases are microcomponents, and phrases or sentences are what we considered to be the macrocomponents in syntactic trees. From the microcomponents it was then easy to build the syntactic trees and incorporate them within their respective phrases and location on the tree.
The microcomponents were tagged in an XML document in < oXygen/> . As we all had a knowledge of syntax, we decided we would not need to tag the English text or create English trees, but rather would tag the aforementioned sections of each chapter in each foreign language. As the Harry Potter books and content are copyright protected, we are unable to post our syntactic markup, but our legend to tagging is as follows:
Adjectives are tagged < adj> , prepositions/postpositions/circumpositions are tagged < prep> , adverbs are tagged < adv> , conjunctions are tagged < conj> , determiners are tagged < dtmr> with an attribute of type=”indefinite, definite or quantifier,” interjectives are tagged < intj> , nouns are tagged < n> , particles are tagged < prtl> , pronouns are tagged < Pr> and verbs are tagged < v> with attributes type=”infinitive or reflexive,” verbs that are imperfective or perfective (this occurs mostly in Russian) or in any tense, were not specified as it would not change the location on a syntax tree.
After tagging the said sections, we then decided to create syntax trees from the syntax tagged portions using TreeForm Tree Drawing Software. With a visual representation of the syntax we were then able to look at the syntax comparatively and draw further analysis about the syntactic similarities and differences across French, Swedish and Russian.