Morisia: A Neural Machine Translation System to Translate between Kreol Morisien and English

By Sameerchand Pudaruth(1), Aneerav Sukhoo(2), Somveer Kishnah(1), Sheeba Armoogum(1), Vandanah Gooria(3), Nirmal Kumar Betchoo(4), Fadil Chady(1), Ashminee Ramoogra(1), Hiteishee Hanoomanjee(1) and Zafar Khodabocus(1) ([1] University of Mauritius; [2] Amity Institute of Higher Education, Mauritius; [3] Open University of Mauritius; [4] Université des Mascareignes, Mauritius)

Abstract

The 2011 population census reveals that out of 1.2 million inhabitants, the Kreol Morisien language is spoken by at least 84 per cent of the population of the Republic of Mauritius. As a matter of fact, Kreol Morisien has been formalised into a dictionary in 2011. Such advancement has allowed the language to be introduced as a full-fledged subject in schools in 2012. In line with the above developments, we have been engaged in setting an online system dedicated for the automatic translation from Kreol Morisien into English and from English into Kreol Morisien. World-renowned online translation services such as Google Translate and Bing Translator do not currently cater for Kreol Morisien as it is very challenging to build neural models for under-resource languages. A deep learning approach based on the Transformer model was used to undertake machine translation. A dataset of 24,810 sentence pairs was fed into the system to build the translation models. The trained models were consequently tested with 1000 new and unseen sentences. The translations were evaluated using the standard BLEU score, that measures the overlap between the automated translation and the human translation. A score of 30.30 was obtained for the translation from Kreol Morisien into English and a score of 26.34 was obtained for the translation from English into Kreol Morisien. This innovative translation system is available as an online service at translatekreol.mu and also as an app on Google PlayStore. The app has been named as Morisia. This interdisciplinary research is the first automatic online translation system for Kreol Morisien. This user-friendly system will be very useful to any citizen of the Republic of Mauritius, as well as to foreign students, tourists and any other prospective individuals willing to learn the Kreol Morisien language.

Keywords: deep learning, Transformer model, Kreol Morisien, Mauritian Creole, attention, Machine Translation

©inTRAlinea & Sameerchand Pudaruth(1), Aneerav Sukhoo(2), Somveer Kishnah(1), Sheeba Armoogum(1), Vandanah Gooria(3), Nirmal Kumar Betchoo(4), Fadil Chady(1), Ashminee Ramoogra(1), Hiteishee Hanoomanjee(1) and Zafar Khodabocus(1) (2021).
"Morisia: A Neural Machine Translation System to Translate between Kreol Morisien and English", inTRAlinea Vol. 23.
This article can be freely reproduced under Creative Commons License.
Stable URL: http://www.intralinea.org/archive/article/2531

1. Introduction

Kreol Morisien or Mauritian Creole is spoken by at least 84 per cent of the Mauritian population (Statistics Mauritius, 2011). Kreol Morisien has gained much acceptance and popularity as a formal language in the last decade. While English and French predominate in terms of formal written languages, Kreol Morisien is widely used for oral communication. Early contributions from some Mauritian authors to establish a Kreol Morisien literature through the publication of books, articles, plays and songs has inevitably paved the way to standardise its written form. The Government of Mauritius decided to introduce the language in primary schools in 2012. In 2017, for the first time in Mauritian history, 4000 students sat for an examination in their maternal language in the Primary School Achievement Certificate (PSAC). In January 2018, Kreol Morisien was offered as an examinable subject to Grade 7 students in secondary schools.

Official statistics confirm that Grade 6 students are performing better in Kreol Morisien in PSAC (Primary School Achievement Certificate) compared to all the Oriental languages (MES 2020). Out of 2975 students who were examined for PSAC in 2019, 2346 passed their exams in Kreol Morisien. This represents a pass rate of 78.86 per cent which is the highest among oriental languages such as Hindi, Tamil, Urdu, Marathi, Telegu, Mandarin and Arabic. Grade 9 students would have been sitting for the Kreol Morisien national exams for the first time in November 2020. However, this examination has now been reported to March-April 2021 due to the Coronavirus pandemic. Slowly but surely, the Kreol Morisien language is gaining its rightful place in the Mauritian society. The broadcast of news in Kreol Morisien (Zournal an Kreol) and a TV channel (Senn Kreol) dedicated for programmes in Kreol Morisien by the MBC (Mauritius Broadcasting Corporation) have been important milestones in elevating the status of the language. Kreol Morisien will also be introduced in the National Assembly once the relevant staff and elected members are trained and the appropriate software for processing Kreol Morisien are available. With such momentum, it is hoped that that an O-Level paper in Kreol Morisien will be available from Cambridge Assessment International Education by 2022.

There are significant reasons why the Kreol Morisien could be useful for Mauritians as well as foreigners. According to Statistics Mauritius (2019), more than 1.3 million tourists visited the island in 2019 and the vision of the government is to bring 100,000 foreign students to Mauritius in the years to come. Equipped with basic reading and writing competences in Kreol Morisien, visitors, tourists and international students will feel more comfortable in this foreign environment. A sizable percentage of Mauritians are not fully conversant in English and hence cannot clearly grasp English texts on signposts, roads, posters, billboards, buildings, online services or articles in English-based newspapers. The language barrier is a handicap for our visitors with limited proficiency in Kreol Morisien and for Mauritians with limited English proficiency. Popular translation services such as Google Translate, Microsoft Bing Translator and DeepL Translator do not cater for Kreol Morisien. The aim of this paper is to develop an automated system to perform translation from English into Kreol Morisien and vice-versa using deep neural networks. To achieve this aim, a web portal translator supported by a mobile app has been developed.

The paper is structured as follows. Section 2 provides a brief on the historical and current developments of the Kreol Morisien language. Section 3 describes the different machine translation approaches. Section 4 explains the methodology that has been adopted in this work. The implementation and evaluation of the system are described in Section 5. The conclusive part of this research and lessons learned are presented in Section 6.

2. Kreol Morisien

It is important to understand the context of Kreol Morisien) as a language of communication in Mauritius. Kreol was developed locally by the slaves from the French language spoken by colonists. In such a difficult time of history, communication was in French and slaves learnt to decipher the language in their own terms. The Kreol dialect gained importance in the local context when it became a mode of communication among the different communities from India, China, Africa and Europe who settled in Mauritius.

From a global perspective, many spoken or local languages are often not recognised as official national languages although they are widely used in society. In the Mauritian context, this spoken language was formerly known as ‘Kreol patois’, which relegated it to a secondary and less formal status. There is a perception that Kreol Morisien is an inferior and informal language, although it is the mother tongue of the overwhelming majority of the population. Kreol Morisien is the main language spoken at home by 84 per cent of Mauritians (a rise of 14 per cent since the 2000 census), while only 3.6 per cent speak French and 5.3 per cent speak English (Statistics Mauritius 2011).

Recognition of Kreol Morisien in Mauritius has been a long and challenging battle for defenders of the language. Dev Virahsawmy (2020), a writer, poet and politician, favoured the use of Kreol Morisien as a national language. Virahsawmy (2020) wrote several texts and poems in Kreol Morisien. He also translated the Shakespearian drama ‘Macbeth’ from English to Kreol Morisien. Commendable efforts were also undertaken by Lalit (2020), for formal communications to be made in Kreol Morisien. In 1984, Ledikasyon pu Travayer (1984) published the first Mauritian Creole to English translation book. In 1987, another dictionary on Mauritian Creole was authored by Philipp Baker and Vinesh Hookoomsingh (1987).

Grafi-larmoni was developed to ensure the standardisation of Kreol Morisien. Grafi-larmoni was an attempt to develop a single and common form of writing the Kreol Morisien. Vinesh Hookoomsingh (2004) related Grafi-larmoni to a harmonised orthography allowing language and orthography to evolve in a flexible and dynamic way. A new dictionary on standard Kreol Morisien was authored by Arnaud Carpooran in 2009, with new versions added on over the years to incorporate new words and new meanings of existing words (Carpooran 2019).

The standard grammar of Kreol Morisien was published in 2011 (Police-Michel, Carpooran and Florigny 2011). The structure of sentences in Kreol Morisien is quite similar to English, however there are notable differences as well. For example, in Kreol Morisien, the adjective most often appears after the object: ‘The red car’ is translated to ‘Loto rouz-la’. Rouge for ‘red’ is moved after the object (Loto). ‘The’ is moved at the end (la). Words have no plural forms in Kreol Morisien unlike in English where the character ‘s’ is often added at the end of words to indicate their plural form. An example is: ‘There are many animals here’ is translated to ‘Ena boukou zanimo isi’. The word ‘boukou’ is used to indicate that there are many animals. When translating from English into Kreol Morisien, it is often necessary to drop extra verbs. An example is: ‘She is good at drawing’ is translated to ‘Li bon dan desine’. ‘She’ is translated to ‘Li’ and ‘good at drawing’ to ‘bon dan desine’. The verb ‘is’ is dropped.

The strategy behind developing machine translation for Kreol Morisien is a commendable effort to foster the development and recognition of a language that binds the Mauritian community emotionally and socially. Kreol Morisien also has a patriotic dimension as it creates a sense of national identity. Machine Translation (MT) has also inherited popularity in the field of education. Although many students are using MT as an aid to language learning, very little is known about its use as a pedagogical tool in formal education (Odacioglu and Kokturk 2015). MT helps to decrease lexico-grammatical errors and improve student performance (Lee 2020). MT positively affects student writing strategies and help them think of writing as a process (Lee 2020). Most of the students in Mauritius use their mother-tongue language, French and English languages in school. Therefore, this work would be of great help for students to harness their linguistic and communication skills.

3. Machine Translation

According to Adam Lopez (2008), machine translation is the translation of text or speech from a source language to a target language. Machine translation techniques have witnessed a rapid evolution paving the way to high-quality translation (Maucec and Donaj 2019). Various techniques have been developed like rule-based, statistical and deep learning. Free online translation tools such as Google Translate, Bing Translator and DeepL Translator have become major assets for those who require text to be translated from one language to many other languages. Language is expected to be no longer a barrier to communication, with so many mobile applications (mobile apps) available from Google Play. Mobile apps can even translate from speech to speech, showing how efficient translation systems have evolved. Progress is continuously being made with speech-to-speech translation and online website translation.  Nevertheless, many challenges such as lexical and syntactic ambiguities still remain (Moussallem, Wauera and Ngomo 2018). Dealing with word ordering issues is also challenging for all types of machine translation systems. Pronoun resolution is especially difficult when translating from Kreol Morisien into English, as Kreol Morisien can be considered as a genderless language.

3.1 Rule-based Approach

The simplest type of rule-based machine translation system works by the replacement of one word in the source language by an equivalent word in the target language. This requires the development of a huge bilingual dictionary which contains the mappings for each word. A word can also be mapped to several words as well in the target language. There is a set of rules that must be followed before the replacement is carried out. Simple re-ordering of words is allowed in rule-based systems, such as the placement of adjectives after nouns when translation from Kreol Morisien to English. Although simple in approach, rule-based systems suffer from a number of problems. It is very difficult to translate long sentences as re-organising the words become almost impossible. Moreover, words are often translated without regard to the context in which they are used. However, rule-based machine translation system has the strength of the incorporation of explicit linguistic knowledge and they can be useful in situations where only very (???) words or very short sentences have to be translated (Kirkedal 2012). This method is useful when there is no significant parallel corpus to be used, and therefore statistical and neural machine translation are not possible. Sameerchand Pudaruth, Lallesh Sookun and Arvind Kumar Ruchpaul (2013) developed the first rule-based translation system for Kreol Morisien.

3.2 Interlingua Approach

Since there are so many languages in the world, it would not be practical to convert each language to another directly. Many languages are also under-resourced and it would be very difficult to create datasets for them. The interlingual approach allows the use of one specific language as the pivot or central language (Supnithi, Sornlertlamvanich and Thatsanee 2002). Since English is the most widely spoken and understood language in the world, it is often used as a pivot language. For example, there is no automatic translator to translate from Kreol Morisien into Hindi. However, it is possible to firstly convert Kreol Morisien into English and then convert the resulting English text into Hindi. This is the basis of the interlingual approach where the translation is done in two phases (Lampert 2004). It is also possible to represent the source into a language-independent representation and then use it to translate to other languages, but such systems have not become popular (Alansary 2014).

3.3 Statistical Machine Translation (SMT)

In contrast to rule-based translation systems, statistical-based translation systems do not require grammatical and syntactic knowledge of the languages that are involved. Instead, a large amount of parallel texts is required in order for the mappings to be extracted automatically (Schwenk, Fouet and Senellart 2008). Naïve replacement of one word by another in isolation do not produce valid translations. Such systems usually require a dictionary to store the fixed mappings. The mappings are obtained through simple frequency statistics. On the other hand, statistical machine translation of a text from a source language to the target language is based on probabilities. The essence of this method is the alignment and mapping of n-grams in the parallel texts. An n-gram is a continuous sequence of words from a text segment. Bigrams are sequences of two words while trigrams are sequences of three words. Trigrams have shown to produce more accurate translations than unigrams or bigrams (Schwenk, Fouet and Senellart 2008). An example of word alignment from a sentence in English to Kreol Morisien as shown in Figure 1.

Fig. 1. Word alignment between English and Kreol Morisien

The above alignment is quite simple as there is no alteration in the order of words in the target language. This reduces the complexity of the translation process. Daniel Marcu and William Wong (2002) proposed that lexical correspondences can be formed both at the word and phrase levels. They estimated the probability that one phrase in the source language is the translation equivalent of the phrase in the target language. They also calculated the probabilities that a certain phrase must occur at a certain position in a sentence. Philipp Koehn, Franz Josef Och and Daniel Marcu (2003) further showed that phrase-based translations give better results than systems based on word-alignments only. Their experiments were conducted on several pairs of European languages. Moses is an open-source statistical machine translation software and it has enabled many researchers and natural language translation practitioners to put forward statistical machine translation systems with high-quality text translations (Koehn et al. 2007). An initial attempt towards SMT between English and Mauritian Creole was made by Aneerav Sukhoo, Pushpak Bhattacharyya and Mahen Soobron (2014).

3.4 Neural Machine Translation

The latest technique, which is showing even better results, is making use of neural networks. Improvement in hardware, like high RAM capacity, hard disk capacity and high processor speed have been the reasons behind this breakthrough. In addition, the use of Graphical Processing Units (GPUs) have improved the machine learning process. The creation of models for translation requires large volumes of parallel sentences and the use of Central Processing Units (CPUs) were found to be slow. With GPUs, neural networks and deep learning have become a promising area for machine translation. Deep learning architectures that join many multilayer perceptrons together to form hidden layers has become popular for the translation of texts. In general, the deeper the neural network, the more sophisticated patterns the network can learn (Alom et al. 2019).  The first layer is called the input layer while the last layer is known as the output layer. The network requires huge amounts of data. For neural machine translation, a very large amount of parallel sentences is required. The network is then able to learn increasingly complex features at each additional layer and finally it delivers the translated text in the target language. Deep learning architectures have replaced SMT-based systems for machine translation as the results obtained from them are much better and more robust (Forcada 2017).

Our core translation system is fully-based on the Tensor2Tensor (T2T) library and the Transformer model (Vaswani et al. 2017; Vaswani et al. 2018). The T2T library contains a number of datasets for different language pairs such as English-German, English-French and English-Vietnamese. There are also pre-built models for six language pairs. All translations in T2T are performed using the Transformer model which uses stacked self-attention layers (Vaswani et al. 2017). Attention is currently one of the most important ideas in machine translation. It is mainly used for sequence-to-sequence models in which there are an encoder and a decoder. The encoder is an LSTM (Long Short-Term Memory) unit which is a type of recurrent neural network (RNN). It converts the input sentence into several vectors. The decoder uses these vectors to make predictions. The attention mechanism allows encoders and decoders to handle longer sentences as only specific vectors are considered at one time. A sample translation system which is based on the Tensor2Tensor library and the Transformater model is available on Google Colab via Github (2020).

4. Methodology

Kreol Morisien is a relatively new language compared to languages such as English, French, German and Spanish. The formalization of the Kreol Morisien language started only one decade ago. This culminated in the production of the Lortograf Kreol Morisien (Orthography of Kreol Morisien) and Gramer Kreol Morisien (Grammar of Kreol Morisien) in 2011 by the Minister of Education & Human Resources and the Akademi Kreol Morisien. Literature in standard Kreol Morisien is still very scarce given that it was only recently formalised and also because the number of people who have formally studied this language is only in the thousands.

Thus, two full-time staff were recruited to create the dataset for this project and they were trained to do so by several members of the research team. The dataset consists of parallel sentences in English and Kreol Morisien. All the original sentences were in English as it is difficult to get good sentences in standard Kreol Morisien. Over a period of 1 year, together they have manually translated 25,810 sentences from English to Kreol Morisien. They also reviewed the work of each other. The sentences were also reviewed by other members of the research team and by several educators who teach Kreol Morisien in primary and secondary schools.

Out of these 25,810 sentences, the first 23,810 sentence pairs were used for training (building the English to Kreol Morisien translation model). The next 1,000 sentence pairs were used for validating the English to Kreol Morisien translation model. These 23,810 sentence pairs were then swapped to perform the training to build the Kreol Morisien to English model. The  1,000 sentence pairs used above were again used for validating the Kreol Morisien to English translation model. The last set of  1,000 sentence pairs were then used to test the trained models. This last set of 1000 sentence pairs was created in the same manner as described earlier. However, they were never used in the training phase. It was kept separate, so that a second level of unbiased testing could be performed. The BLEU (BiLingual Evaluation Understudy) score was used as a metric to evaluate the quality of the translated texts (Papineni et al. 2002). The BLEU score is a value which can range from 0 to 100. The higher the score, the better the result is likely to be. The models (English to Kreol Morisien and Kreol Morisien to English) were then served via a webserver and an Android app. Two different workshops were held with educators of the Kreol language in order to obtain their feedback and for pilot testing. The first one was conducted in the island of Mauritius at the beginning of the project in November 2018 in order to gather requirements from primary and secondary school teachers. This meeting was attended by more than 100 Kreol Morisien educators. One of the main aims of this meeting was to draw up a list of textual Kreol Morisien resources that could be used in this work. Since there are very few works currently in this language, creating a dataset of parallel sentences was a huge problem. The educators directed us to relevant resources which were based on standard Kreol Morisien. Many educators also expressed their willingness to support us in this work either through creating the dataset or providing constant feedback on our work, especially regarding translation quality. The second one was held in February 2019 in the island of Rodrigues, again to gather further requirements from primary school teachers and other relevant stakeholders. The aims were similar to the first one. However, in this second workshop, we found out that the Kreol that is being used in Rodrigues island is slightly different from the one used in the island of Mauritius. Both Rodrigues and Mauritius are islands that form part of the Republic of Mauritius. Two months before the end of the project, in October 2019, the completed website and app were shared with all the educators for pilot testing. The views and comments received were taken into consideration to further refine our work. An awareness programme about the website and the app was also conducted in Rodrigues in November 2019.

Statistics

English

Kreol Morisien

Number of sentences

24,810

24,810

Total number of words

183,163

176,114

Number of unique words

13,644

13,456

Length of the shortest sentence

1

1

Length of the longest sentence

26

29

Average number of words in a sentence

7.4

7.1

Table 1. Comparison of the English and Kreol Morisien datasets used in training and validation

Table 1 shows a comparison of the English and Kreol Morisien datasets used in training and validation. We can see that the average number of words in an English sentence is slightly higher than in a Kreol Morisien sentence. This means that Kreol Morisienis slightly more compact than English, i.e., we are able to say slightly more things in Kreol Morisien than in English language when using the same number of words. The second edition of the Diksioner Morisien contains 17,000 unique words (Carpooran 2011). Thus, we have not yet been able to consider all Kreol Morisien words in our system as there are only 13,456 unique words in the dataset. 2,400 new words have also been added in the third edition of the Diksioner Morisien (Carpooran 2019). Moreover, the English language contains more than 100,000 words but only 13,644 are available in our system. Dataset creation is an on-going process and we intend to double our dataset in future works.

All our experiments were performed on a desktop computer with an Intel Core i7-6700 @3.40GHz processor running the Microsoft Windows 10 Pro 64-bit operating system with a RAM (Random Access Memory) memory of 16GB, an SSD (Solid State Device) of 120 GB and a hard drive of 1 TB. The software was implemented using the Python programming language on the Anaconda platform. The training for the machine translation was performed using the Tensor2Tensor library and the Transformer model (Vaswani et al. 2017; Vaswani et al. 2018). This library is built on top of TensorFlow which was developed by Google.

5. Implementation and Evaluation of Results

As part of this translation work, a website has been implemented to perform the translation of text from Kreol Morisien into English and vice-versa, as shown in Figure 2. The portal is accessible via the translatekreol.mu domain. The default choice (highlighted in green) is from Kreol Morisien (source language) to English (target language). There are four options under the source language which are: Translate, Clear all texts, Check Spelling and Send suggestion.

Fig. 2. Main interface of the online translation system

The Translate button translates text from Kreol Morisien into English if the source is set to Kreol. The message ‘Tradiksion pe fer, enn ti moman ankor’ appears while the text is being translated. This basically tells the user that the translation is being done and to please wait for some time to see the results. Both single words and sentences can be translated. It takes about 10 seconds on average to process a query. The processing time is quite high because we are using a shared server. On a dedicated webserver, the processing time would be reduced. When the translation is completed, the result appears in the textbox on the right. From there, the Copy Translation button can be used to copy the translated text to another location, for example to Google Translate, if the user wishes to translate the English text into some other language. The Clear all texts button simply clears all the texts present in both textboxes. It is not a compulsory function to use as the text can also be edited directly from any of the textboxes.

Fig. 3. Autocorrect feature

As shown in Figure 3, an autocorrect feature for Kreol Morisien text is also available in the system. As soon as a user starts entering text in Kreol Morisien, a spell-check operation is automatically started in the background to check whether the word is a valid one. If the words are valid ones, no message appears. However, as soon as it detects words that are not found in the dictionary, a suggestion is made as shown in Figure 3. For example, in this case, the user has entered the text ‘Mo lotoo pa pe rooule’. The words ‘lotoo’ and ‘rooulee’ are not valid Kreol Morisien words. Thus, the message ‘Ou pe rod dir’ appears at the bottom screen together with a proposed corrected version of the input text. ‘Ou pe rod dir’ literally means ‘Are you trying to say’. The input text can be replaced automatically with the suggested text (in blue) by simply clicking on it.

Fig. 4. Spelling checker

Clicking the Check spelling button highlights the wrongly written words in yellow as shown in Figure 4. To obtain valid suggestions for these words, the user must right-click on them. For example, for the incorrect word ‘rooulee’, the system has provided seven suggestions. If the correct word is found in this list, it can be selected through a click. The incorrect word in the sentence will then be replaced by the correct one. Although the spelling-checker is very reliable, it is possible that none of the proposed words is the correct one. If a user is not satisfied with the translated text, it is possible to use the Send suggestion feature to edit the text and send it to the research team. A confirmation message is shown on the screen when the suggestion is properly submitted. This is a form of feedback which will help us understand the weak points of the system for subsequent improvements.

Fig. 5. Kreol Morisien to English translation  |  Fig. 6. English to Kreol Morisien  translation

An Android mobile app has also been implemented in this research work. The app can also perform the translation of Kreol Morisien to English and vice-versa. The translation model is the same as the one in the online platform. However, the app has been intentionally kept very simple so that it is very easy to use but also because of the limited screen space that is available in smartphones. Only the Translate button is available in the app as shown in Figure 5 and Figure 6. The default choice for the translation is from Mauritian Creole into English. To perform English to Mauritian Creole translation, the user must simply toggle the switch to the right.

Fig. 7. BLEU score for English to Kreol Morisien translation during training

Fig. 8. BLEU score for Kreol Morisien to English translation during training

The quality of the translation was evaluated using the BLEU metric. As mentioned earlier, a test set of 1000 unseen sentences were used to evaluate the two models. A BLEU score of 26.34 was obtained for the English to Kreol Morisien translation model while a score of 30.30 was obtained for the Kreol Morisien to English model. The training was performed for 100,000 steps for both models and the BLEU score was noted for every 10,000 steps. The highest BLEU score recorded during training for English to Kreol Morisien was 22.71 as shown in Figure 7. The highest BLEU score recorded during training for Kreol Morisien to English was 26.88 as shown in Figure 8. There is a difference of 3.63 units between the BLEU score of the English to Kreol Morisien model and a difference of 3.42 units between the BLEU score of the Kreol Morisien to English model in the validation and evaluation sets as the internal BLEU scores used for validating the model are not calculated in exactly the same way (Github 2020). During the training phase, a simpler version of the BLEU score is used so that it can be calculated fast while in the evaluation phase, the standard BLEU formula is applied. Sample translations from both models are available in the Appendix.

6. Conclusions

With each passing year, Kreol Morisien is gaining more and more momentum. After its introduction in 2012 in primary schools, it was introduced in secondary schools in 2018 and the Mauritian government is now planning to allow the use of Kreol Morisien in the National Assembly once the necessary infrastructures are set up. Thus, the number of formal users of Kreol Morisien is consistently growing. Since Kreol Morisien in its written form is a very recent phenomenon, most Mauritians do not know how to write it properly. The need for an anytime-anywhere platform to learn this language is being deeply felt. Thus, in this research, we have implemented an online platform (translatekreol.mu) for the translation from Kreol Morisien into English and vice-versa. The system can translate single words as well as sentences. An Android app, under the name of Morisia, is also available on Google Play Store. The quality of the translation is similar in both directions as measured using the BLEU score. Thus, to our knowledge, translatekreol.mu is the first online platform which translates sentences from Kreol Morisien into English and from English into Kreol Morisien. The same can be said for the Morisia app. In the future, we intend to double the dataset from 25,810 parallel sentences to 50,000 to train the system.

7. Acknowledgements

This paper is based on work supported by the Tertiary Education Commission (TEC) under award number INT-2018-10. However, any opinion, findings and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of TEC. We are also indebted to the numerous educators of the Kreol Morisien language who have contributed to the dataset.

Appendix

Sample Translations

a. Kreol Morisien to English

Source Text in Kreol Morisien

Translated Text in English

So move lasante inn anpes li vwayaze.

His bad health has prevented him from the travel.

Komie sa?

How much is this?

Tom pou de retour avan de-zer trant.

Tom will be in return before two thirty.

Tom inn kokin plin larzan depi ar Mary.

Tom has fooled money from Mary.

Tom pa le pran ankor travay.

Tom doesn't want to take any more work.

Mo bien kontan sa zip la.

I like this skirt.

To panse mo bizin dir Tom?

Do you think I must say Tom?

Ziz la inn anil desizion final la. 

The judge has cancel the final decision.

Mo papa pa pou les mwa sorti avek Bill.

My father won't let me go out with Bill.

To bizin evit fer bann erer koumsa.

You must avoid making such a mistake.

 

b. English to Kreol Morisien

Source Text in English Morisien

Translated Text in Kreol

He studied hard in order to pass the test.

Li finn etidie dirman pou pas so test.

He was as gentle a man as ever lived.

Li ti kouma enn misie ki zame viv.

Tom ran into the house.

Tom finn sove dan lakaz.

She made the same mistake again.

Li finn fer mem erer.

I understand it's going to get hot again.

Mo konpran sa pou gagn so.

I listened to the music of birds.

Mo ti ekout lamizik so bann zwazo.

She'll be up around by this afternoon.

Nou bizin fer pre pou sa lapremidi-la.

It is a wise father that knows his own child.

Se enn bon papa ki so prop zanfan.

She had to stand in the train.

Li finn bizin deboute dan trin.

Let's stop playing tennis.

Anou aret zwe tenis.

References

Alom, Md Zahangir, Tarek M. Taha, Chris Yakopcic, Stefan Westberg, Paheding Sidike, Mst Shamima Nasrin, Mahmudul Hasan, Brian C. Van Essen, Abdul A. S. Awwal and Vijayan. K. Asari (2019) “A State-of-the-Art Survey on Deep Learning Theory and Architectures”, Electronics, Vol. 8(3). doi:10.3390/electronics8030292

Alansary, Sameh (2014) “Interlingua-based Machine Translation Systems: UNL versus Other Interlinguas”, Egyptian Journal of Language Engineering, Vol. 1(1): 42-54. doi:10.21608/EJLE.2014.59863

Baker, Philip and Vinesh Hookoomsing (1987) Morisyen – English – French: Dictionary of Mauritian Creole, France: Editions L'Harmattan.

Carpooran, Arnaud (2011) Diksioner Morisien (2nd ed), Mauritius: Les Editions Le Printemps.

Carpooran, Arnaud (2019) Diksioner Morisien (3rd ed), Mauritius: Les Editions Le Printemps.

Forcada, L. Mikel (2017) “Making sense of neural machine translation”, Translation Spaces, Vol. 6(2): 291-309. doi:10.1075/ts.6.2.06for

Github (2020) “Welcome to the Tensor2Tensor Colab”, URL: [url=https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/notebooks/Transformer_translate.ipynb]https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/notebooks/Transformer_translate.ipynb[/url] (accessed 29 November 2019).

Hookoomsingh, Vinesh (2004) “A harmonized writing system for Mauritian Creole Language”, Ministry of Education and Scientific Research, URL: [url=http://ministry-education.govmu.org/English/Documents/Publications/arch%20reports/hookoomsing.pdf]http://ministry-education.govmu.org/English/Documents/Publications/arch%20reports/hookoomsing.pdf[/url]

Kirkedal, A. Soeborg (2012) “Tree-based Hybrid Machine Translation”, Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra), pp. 77–86, Avignon, France.

Koehn, Philipp, Franz Josef Och and Daniel Marcu (2003) “Statistical Phrase-Based Translation”, Proceedings of HLT-NAACL, pp. 48-54, Edmonton, Canada.

Koehn, Philipp, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondřej Bojar, Alexandra Constantin and Evan Herbst (2007) “Moses: Open Source Toolkit for Statistical Machine Translation”, Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, 177-180, Prague, Czech Republic.

Lalit (2020) URL: [url=https://www.lalitmauritius.org/]https://www.lalitmauritius.org/[/url] (accessed 25 July 2020).

Lampert, Andrew (2004) “Interlingua in Machine Translation”, URL: [url=https://www.scribd.com/document/73292862/InterlinguaInMachineTranslation]https://www.scribd.com/document/73292862/InterlinguaInMachineTranslation[/url] (accessed 25 July 2020).

Ledikasyon pu Travayer (1984) Diksyoner Kreol - Angle, Mauritius: Ledikasyon pu Travayer.

Lee, Sangmin-Michelle (2020) “The impact of using machine translation on EFL students’ writing”, Computer Assisted Language Learning, 33:3, 157-175. doi: 10.1080/09588221.2018.1553186

Lopez, Adam (2008) “Statistical Machine Translation”, ACM Computing Surveys, 3, pp. 1-49.

Marcu, Daniel and William Wong (2002) “A Phrase-based, Joint Probability Model for Statistical Machine Translation”, Proceedings of the Conference on Methods in Natural Language Processing (EMNLP), pp. 133-139, Philadelphia, USA.

Maucec, Mirjam Sepesy and Gregor Donaj (2019) “Machine Translation and the Evaluation of its Quality”, in Recent Trends in Computational Intelligence. doi:10.5772/intechopen.89063

MES (2020) “Mauritius Examination Syndicate – PSAC Assessment Grade 6 – 2019”, URL: [url=http://mes.intnet.mu/English/Documents/statistics/psac_stats/2019/2019_psac_peformance_subjectwise_first_sitting.pdf]http://mes.intnet.mu/English/Documents/statistics/psac_stats/2019/2019_psac_peformance_subjectwise_first_sitting.pdf[/url] (accessed 25 July 2020).

Moussallem, Diego, Matthias Wauera and Axel-Cyrille N. Ngomo (2018) “Machine Translation using Semantic Web Technologies: A Survey”, Journal of Web Semantics. doi:10.2139/ssrn.3248493

Odacioglu, Mehmet Cem and Saban Kokturk (2015) “The Effects of Technology on Translation Students in Academic Translation Teaching”, Social and Behavioral Sciences, 197, 1085-1094. doi:10.1016/j.sbspro.2015.07.349

Papineni, Kishore, Salim Roukos, Todd Ward and Wei-Jing Zhu (2002) “BLEU: a Method for Automatic Evaluation of Machine Translation”, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), 311-318, Philadelphia, USA.

Police-Michel, Daniella, Arnaud Carpooran and Guilhem Florigny (2011) Gramer Kreol Morisien, Mauritius: Ministry of Education and Human Resources.

Pudaruth, Sameerchand, Lallesh Sookun and Arvind Kumar Ruchpaul (2013) “English to Creole and Creole to English Rule Based Machine Translation System”, International Journal of Advanced Computer Science and Applications , 4:8, 25-29. doi:10.14569/IJACSA.2013.040805

Schwenk, Holger, Jean-Baptiste Fouet and Jean Senellart (2008) “First Steps towards a general-purpose French/English Statistical Machine Translation System”, Proceedings of the Third Workshop on Statistical Machine Translation, pp. 119-122, Columbus, Ohio, USA.

Statistics Mauritius (2011) “2011 Population Census – Main Results”, Government of Mauritius, URL: [url=http://statsmauritius.govmu.org/English/CensusandSurveys/Documents/ESI/pop2011.pdf]http://statsmauritius.govmu.org/English/CensusandSurveys/Documents/ESI/pop2011.pdf[/url] (accessed 25 July 2020).

Statistics Mauritius (2019) “International Travel & Tourism - Year 2019”, Government of Mauritius, URL: [url=http://statsmauritius.govmu.org/English/Publications/Pages/Tourism_Yr19.aspx#:~:text=1.,13.9%25%20from%2039%2C720%20to%2045%2C253]http://statsmauritius.govmu.org/English/Publications/Pages/Tourism_Yr19.aspx#:~:text=1.,13.9%25%20from%2039%2C720%20to%2045%2C253[/url] (accessed 25 July 2020).

Sukhoo, Aneerav, Pushpak Bhattacharyya and Mahen Soobron (2014) “Translation between English and Mauritian Creole: A statistical machine translation approach”, Proceedings of the IST-Africa Conference, Mauritius. doi:10.1109/istafrica.2014.6880635

Supnithi, Thepchai, Virach Sornlertlamvanich and Charoenporn Thatsanee (2002) “A Cross System Machine Translation”, Proceedings of the 2002 COLING Workshop on Machine Translation in Asia, pp. 81-87, Taipei, Taiwan. doi:10.3115/1118794.1118806

Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser and Illia Polosukhin (2017) “Attention is all you need”, Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6000-6010, Long Beach, CA, USA.

Vaswani, Ashish, Samy Bengio, Eugene Brevdo, Francois Chollet, Aidan N. Gomez, Stephan Gouws, Llion Jones, Lukasz Kaiser, Nal Kalchbrenner, Niki Parmar, Ryan Sepassi, Noam Shazeer and Jakob Uszkoreit (2018) “Tensor2Tensor for Neural Machine Translation”, arXiv:1803.07416 [cs.LG].

Virahsawmy, Dev (2020) URL: [url=https://boukiebanane.com/]https://boukiebanane.com/[/url] (accessed 26 July 2020).

About the author(s)

Sameerchand Pudaruth is a Senior Lecturer and Head of ICT Department at the University of Mauritius. He holds a PhD in Artificial Intelligence from the University of Mauritius. He is a senior member of IEEE, founding member of the IEEE Mauritius Subsection and the current Vice-Chair of the IEEE Mauritius Section. He is also a member of the Association for Computing Machinery (ACM). His research interests are Artificial Intelligence, Machine Learning, Data Science, Machine Translation, Computer Vision, Robotics, Blockchain and Information Technology Law. He has written more than 50+ papers for national & international journals and conferences. He has also written a book entitled, 'Python in One Week'. Somveer Kishnah is a lecturer in the Department of Software and Information Systems (SIS), Faculty of Information, Communication and Digital Technologies at the University of Mauritius. He joined the University of Mauritius in September 2010 and has a Bachelor’s degree in Information Systems and a Master’s degree in Computer Science and Engineering. His research currently revolves around the people factor in both the development and usage of software and combines Artificial Intelligence and Emotional Intelligence in view of promoting better user experiences. In the context of a future smart Mauritius, his study is focussing on intelligent systems equipped with emotions that can help in bridging the communication gap between the hearing impaired and hearing population. Aneerav Sukhoo is the Deputy Director of the Central Information Systems Division of the Ministry of Information Technology, Communication and Innovation. He has held responsibilities as Systems Analyst, Project Manager, Technical Manager, Deputy Director and Director of institutions spearheading the computerisation programme in Government for the last 30 years. He holds a PhD in Computer Science from UNISA and conducted postdoctoral research at the Indian Institute of Technology, Bombay. He was Professor and Dean of IT at the Amity Institute of Higher Education on a full time basis in 2019 & 2020. He has also provided lectures at various universities and supervised several doctoral students. Sheeba Armoogum is a Senior Lecturer at the University of Mauritius and past Head of ICT Department of ICT. She has a BSc in Physics, Mathematics and Electronics at the Bangalore University, India and a MSc in Computer Applications at the Madurai Kamaraj University, India. She has more than 14 years of experience in teaching & learning at the tertiary level with more than 20 publications. Her fields of research are networking & security, Cyber Forensics, AI & Machine Learning. Sheeba has a strong industrial background. Before joining UoM, she worked in an American company in Bangalore as team leader and project manager. She was part of several international conferences including the IEEE AFRICON 2013, IEEE EmergiTech 2016 and IEEE NextComp 2019. Vandanah Gooria is a programme manager and lecturer in Marketing, Management and Special Needs Management at the Open University of Mauritius. She has 13 years of experience in administration and has over 7 years of professional and academic experience encompassing market research and surveys, development and authoring of course materials. She has written one book chapter and published many research papers. She has a specific interest in serving vulnerable groups and she has been involved in social activities for more than 4 years. Her areas of interest are mainly special education needs, marketing, management, open distance learning and Open Educational Resources (OER). Nirmal Kumar Betchoo is a tenured faculty and former Dean at the Université des Mascareignes. He holds a DBA (Switzerland), an MBA (Scotland) as well as being a Graduate of the professional examinations of the Chartered Institute of Marketing and the Institute of Administrative Management (UK). He is the author of 13 books published nationally and internationally. He has published over 60 peer-reviewed articles in international refereed journals. He is an editor for the Journal of Mass Communications (USA) and the European Scientific Journal (ESJ). As a scholar, he reviews papers for many international journals and conferences. Dr Betchoo writes extensively for the local press where he has published lead papers out of some 150 articles he has been publishing since 2012. Fadil Chady has earned a bachelor’s degree in Applied Computing from the University of Mauritius. He has worked as Research Assistant for the project entitled, “Automatic Identification of Medicinal Plants in Mauritius via a Mobile Application using Computer Vision and Artificial Intelligence Techniques”, at the University of Mauritius in 2018 and 2019. The project was funded by the Tertiary Education Commission (TEC). He has acquired skills in the following fields: computer vision, machine learning, artificial intelligence, deep learning, web programming, server administration on Linux, web services, managing cloud services and natural language processing. He is currently working as a Systems Engineer in the ICT industry. Ashminee Devi Ramoogra studied Computer Science at the University of Mauritius. She has worked as Trainee Research Assistant for the project entitled, “Creole to English and English to Creole Machine Translation using Natural Language Processing Techniques and Deep Learning Neural Networks”, at the University of Mauritius from 2018 to 2020. The project was funded by the Tertiary Education Commission (TEC). She has excellent knowledge of web technologies, MySQL and programming languages such as C++ and Java. As part of the project, she was also required to create a dataset of English sentences and their equivalent in Kreol Morisien. Thus, she has also acquired an in-depth knowledge of Kreol Morisien. Hiteishee Hanoomanjee studied Computer Science at the University of Mauritius. She has worked as Trainee Research Assistant for the project entitled, “Creole to English and English to Creole Machine Translation using Natural Language Processing Techniques and Deep Learning Neural Networks”, at the University of Mauritius from 2018 to 2020. The project was funded by the Tertiary Education Commission (TEC). She has excellent knowledge of web technologies, MySQL and programming languages such as C++ and Java. As part of the project, she was also required to create a dataset of English sentences and their equivalent in Kreol Morisien. Thus, she has also acquired an in-depth knowledge of Kreol Morisien. Mohammad Zafar Khodabocus has earned a bachelor’s degree in Software Engineering from the University of Mauritius. He has worked as Research Assistant for the project entitled, “Creole to English and English to Creole Machine Translation using Natural Language Processing Techniques and Deep Learning Neural Networks”, at the University of Mauritius from 2018 to 2020. The project was funded by the Tertiary Education Commission (TEC). He has acquired skills in the following fields: Internet of Things (IoT), Machine Learning (ML), Artificial Intelligence (AI), Deep Learning (DL), Machine Translation (MT), Internet Technologies and Game development. He is currently working as a Software Engineer in the ICT industry.

Email: [please login or register to view author's email address]

©inTRAlinea & Sameerchand Pudaruth(1), Aneerav Sukhoo(2), Somveer Kishnah(1), Sheeba Armoogum(1), Vandanah Gooria(3), Nirmal Kumar Betchoo(4), Fadil Chady(1), Ashminee Ramoogra(1), Hiteishee Hanoomanjee(1) and Zafar Khodabocus(1) (2021).
"Morisia: A Neural Machine Translation System to Translate between Kreol Morisien and English", inTRAlinea Vol. 23.
This article can be freely reproduced under Creative Commons License.
Stable URL: http://www.intralinea.org/archive/article/2531

Go to top of page