Algorithm for Assisting Grammarians when Extracting Phonological Conditioning Rules for Nguni languages

Authors

  • Zola Mahlaza
  • Langa Khumalo

DOI:

https://doi.org/10.55492/dhasa.v5i1.5013

Keywords:

Language Technologies, Low-resource Languages, Data extraction, Phonological Conditioning, Natural Language Generation

Abstract

Text generation models, the core technology that underpins chatbots such as ChatGPT, that are created to support morphologically complex African languages require the modelling of sub-word processes such as phonological conditioning. Since we rely on explicit phonological conditioning rules that are manually identified by grammarians to determine the extent to which such models are able to perform for such languages, there is a need to assist grammarians via computational solutions to increase their coverage of known rules. At present, there are no existing algorithms to extract the rules for such processes and therefore enable the creation of building better text generation models. We present a new algorithm for extracting phonological conditioning rules for Nguni languages. All the rules extracted by the algorithm are valid when the input word and associated morphemes are judged to be valid. The algorithm has the potential to improve the productivity of grammarians and enable the creation of modern text generation technologies that support and promote under-resourced languages.

Downloads

Published

2024-02-19

How to Cite

Algorithm for Assisting Grammarians when Extracting Phonological Conditioning Rules for Nguni languages. (2024). Journal of the Digital Humanities Association of Southern Africa , 5(1). https://doi.org/10.55492/dhasa.v5i1.5013