Oxford University logo

University of Oxford
Linguistics, Philology & Phonetics

Machine-readable grammatical resources for Indonesian

Machine-readable grammatical resources for Indonesian
ESRC Project RES-000-22-3063
Principal investigator: Mary Dalrymple
Co-investigator: Suriel Mofu

This project, which ran from 2008-2009, produced grammatical resources for Indonesian to guide grammar development for computer-implemented grammars and to establish a standard by which grammar coverage can be measured. The resources consist of a set of 52 machine-readable (plain text) files containing acceptable and unacceptable sentences of Indonesian, their translations, and comments on their grammatical structure. The 52 separate files are available below; you can also download a single file containing all topics:

indonesian-testsuites.txt

The resource differs from standard grammars and textbooks of Indonesian, which assume that the human reader or learner can fill in a full paradigm on the basis of an abstract description or a few representative examples. Unlike corpora assembled from naturally occurring texts, the files contain unacceptable as well as acceptable examples; including unacceptable examples is crucial in ensuring that grammars produce only well-formed analyses, and do not accept ungrammatical input. The data are available below, and are also available from the UK Data Archive at the following URL:

http://discover.ukdataservice.ac.uk/catalogue/?sn=850309&type=Data%20catalogue

Our project connects with the project "Understanding Indonesian: developing a machine-usable grammar, dictionary and corpus", based at the Australian National University and funded by the Australian Research Council, with which PI Dalrymple is associated as a partner investigator. The Australian project is producing a broad-coverage grammar, lexicon, and balanced corpus of Indonesian as a part of the Parallel Grammar Project (PARGRAM), an international consortium of academic and commercial research institutions to develop computational grammars and lexicons within the shared linguistic framework of Lexical Functional Grammar (LFG). The testsuites are essential to their work in guiding the development of the grammar, ensuring coverage of less common as well as of basic constructions, testing the full paradigm of constructions and their interactions, and testing the "tightness" of the grammar in excluding impossible analyses as well as producing well-formed analyses for the constructions under examination. Feedback from the "Understanding Indonesian" project has guided development of the testsuites and ensured full coverage and comprehensiveness.

1. Basic noun phrases

2. Reflexives

Relative clauses

3. Basic relative clause patterns

4. Defining, topic-comment, prepositional, and locative relative clauses

5. Numbers and number phrases

Prepositions and prepositional phrases

6. Locative and nonlocative prepositions

7. Prepositions pada, di, oleh

Basic verbal clauses

8. Intransitive, transitive, ditransitive verbs

9. Tense/aspect: sudah, telah, sedang, masih, lagi, tengah, akan

10. Aspect: bakal, baru, pernah

11. Modals

12. Voice

13. Basic copular clauses

14. The verb "ada": existential, possessive, and emphatic uses

Basic non-verbal clauses

15. Noun clauses, copulas adalah and ialah, adjective clauses, quantity clauses, prepositional clauses

16. Adjective clauses

Nominal clauses

17. Simple nominal clauses

18. Nominalised relative clauses

19. Predicate nominalisation

Clausal word order

20. Basic word order

21. Word order in copular clauses

22. Topic-comment clauses

23. Identifying clauses

Double object constructions

24. Double object constructions with -kan and with no suffix

25. Double object constructions with -i and -kan, part 1

26. Double object constructions with -i and -kan, part 2

27. Double object constructions and passive voice, file 1

28. Double object constructions and passive voice, file 2

29. Double object constructions and passive voice, file 3

Complement clauses

30. Complementiser "bahwa"

31. Complementiser "untuk"

32. Complementiser "agar" and "supaya"

33. Negation

Questions

34. Questions with apa(kah), siapa(kah), and interrogative suffix -kah

35. Yes-no questions, tag questions, short answers

36. Specific questions: Apa, siapa, berapa, kenapa and mengapa

37. Specific questions with mana, di mana, ke mana, dari mana, bagaimana, bilamana, kapan; indirect questions

38. Imperatives

39. Ellipsis

Coordination and subordination

40. Coordinating conjunctions

41. Subordinating conjunctions: clauses of time and condition

42. Subordinating conjunctions: clauses of reason, purpose, extent

43. Subordinating conjunctions: clauses of concession, resemblance, contrast; clauses with no subordinator

Sentential adjuncts

44. Adverbs of manner, adjectives used as adverbs, reduplicated adjectives, adverbs with dengan and secara

45. Adverbial words, adverbs derived from adjectives, numbers as adverbs

46. Temporal adjuncts: clock time

47. Temporal adjuncts: days of the week and their parts

48. Temporal adjuncts: months, years, times of day, reduplication of parts of day

49. Temporal adjuncts: prepositional phrases indicating specific time, phrases indicating relative time

50. Adverbial sentence linkers indicating a connection between two sentences

51. Adjuncts of location

52. Focusing adjuncts

ParGram | LFG | Linguistics, Philology & Phonetics