Machine-readable grammatical resources for Indonesian
ESRC Project RES-000-22-3063
Principal investigator: Mary Dalrymple
Co-investigator: Suriel Mofu
|
 |
This project, which ran from 2008-2009, produced grammatical resources
for Indonesian to guide grammar development for computer-implemented
grammars and to establish a standard by which grammar coverage can be
measured. The resources consist of a set of 52 machine-readable (plain
text) files containing acceptable and unacceptable sentences of
Indonesian, their translations, and comments on their grammatical
structure. The 52 separate files are available below; you can also
download a single file containing all topics:
indonesian-testsuites.txt
The resource differs from standard grammars and textbooks of
Indonesian, which assume that the human reader or learner can fill in
a full paradigm on the basis of an abstract description or a few
representative examples. Unlike corpora assembled from naturally
occurring texts, the files contain unacceptable as well as acceptable
examples; including unacceptable examples is crucial in ensuring that
grammars produce only well-formed analyses, and do not accept
ungrammatical input. The data are available below, and are also available from the UK Data Archive at the following URL:
http://discover.ukdataservice.ac.uk/catalogue/?sn=850309&type=Data%20catalogue
Our project connects with the
project "Understanding Indonesian: developing a machine-usable
grammar, dictionary and corpus", based at the Australian National
University and funded by the Australian Research Council, with which
PI Dalrymple is associated as a partner investigator. The Australian
project is producing a broad-coverage grammar, lexicon, and balanced
corpus of Indonesian as a part of the Parallel Grammar Project
(PARGRAM),
an international consortium of academic and commercial research
institutions to develop computational grammars and lexicons within the
shared linguistic framework of Lexical Functional Grammar
(LFG). The
testsuites are essential to their work in guiding the development of
the grammar, ensuring coverage of less common as well as of basic
constructions, testing the full paradigm of constructions and their
interactions, and testing the "tightness" of the grammar in excluding
impossible analyses as well as producing well-formed analyses for the
constructions under examination. Feedback from the "Understanding
Indonesian" project has guided development of the testsuites and ensured
full coverage and comprehensiveness.
|
1. Basic noun phrases
2. Reflexives
Relative clauses
3. Basic relative clause patterns
4. Defining, topic-comment, prepositional, and locative relative clauses
5. Numbers and number phrases
Prepositions and prepositional phrases
6. Locative and nonlocative prepositions
7. Prepositions pada, di, oleh
Basic verbal clauses
8. Intransitive, transitive, ditransitive verbs
9. Tense/aspect: sudah, telah, sedang, masih, lagi, tengah, akan
10. Aspect: bakal, baru, pernah
11. Modals
12. Voice
13. Basic copular clauses
14. The verb "ada": existential, possessive, and emphatic uses
Basic non-verbal clauses
15. Noun clauses, copulas adalah and ialah, adjective clauses, quantity clauses, prepositional clauses
16. Adjective clauses
Nominal clauses
17. Simple nominal clauses
18. Nominalised relative clauses
19. Predicate nominalisation
Clausal word order
20. Basic word order
21. Word order in copular clauses
22. Topic-comment clauses
23. Identifying clauses
Double object constructions
24. Double object constructions with -kan and with no suffix
25. Double object constructions with -i and -kan, part 1
26. Double object constructions with -i and -kan, part 2
27. Double object constructions and passive voice, file 1
28. Double object constructions and passive voice, file 2
29. Double object constructions and passive voice, file 3
Complement clauses
30. Complementiser "bahwa"
31. Complementiser "untuk"
32. Complementiser "agar" and "supaya"
33. Negation
Questions
34. Questions with apa(kah), siapa(kah), and interrogative suffix -kah
35. Yes-no questions, tag questions, short answers
36. Specific questions: Apa, siapa, berapa, kenapa and mengapa
37. Specific questions with mana, di mana, ke mana, dari mana, bagaimana, bilamana, kapan; indirect questions
38. Imperatives
39. Ellipsis
Coordination and subordination
40. Coordinating conjunctions
41. Subordinating conjunctions: clauses of time and condition
42. Subordinating conjunctions: clauses of reason, purpose, extent
43. Subordinating conjunctions: clauses of concession, resemblance, contrast; clauses with no subordinator
Sentential adjuncts
44. Adverbs of manner, adjectives used as adverbs, reduplicated adjectives, adverbs with dengan and secara
45. Adverbial words, adverbs derived from adjectives, numbers as adverbs
46. Temporal adjuncts: clock time
47. Temporal adjuncts: days of the week and their parts
48. Temporal adjuncts: months, years, times of day, reduplication of parts of day
49. Temporal adjuncts: prepositional phrases indicating specific time, phrases indicating relative time
50. Adverbial sentence linkers indicating a connection between two sentences
51. Adjuncts of location
52. Focusing adjuncts
|
|