Site icon fundsforNGOs

Call for Applications: Lexicon Fellowship in South Africa

Open Call: Thematic Programme on Human Rights and Democracy in Bosnia and Herzegovina

Deadline: 28-Jun-2026

Code for Africa is offering a three-month, part-time Lexicon Fellowship for NLP and data science experts based in South Africa. The fellowship supports the development of structured, machine-readable hate speech and incitement lexicons to improve information integrity, human rights monitoring, and AI-driven analysis of harmful online narratives.

Selected fellows will receive a competitive monthly stipend and technical support from Code for Africa’s iLab and TechLab teams. Applicants should have strong skills in Python, natural language processing, machine learning, low-resource South African languages, social media data collection, and ethical data practices.

What is the Code for Africa Lexicon Fellowship?

The Code for Africa Lexicon Fellowship is a part-time opportunity for NLP specialists, machine learning practitioners, and data science experts.

The fellowship focuses on building structured hate speech and incitement lexicons that can support human rights monitoring and information integrity work.

The programme is designed to help identify and analyse harmful language used in online spaces, especially language linked to hate speech, online violence, dehumanisation, incitement, and coordinated influence campaigns.

Main Purpose of the Fellowship

The main purpose of the fellowship is to strengthen AI-supported monitoring of harmful online narratives in South Africa.

The fellowship supports the development of lexicons that are:

Fellowship Duration

The fellowship runs for three months.

It is a part-time fellowship.

Fellowship Benefits

Selected fellows will receive:

Geographic Focus

Applicants must be based in South Africa.

The fellowship focuses on South African civic, cultural, political, and linguistic contexts.

Who is Eligible?

The fellowship is open to NLP and data science experts based in South Africa.

Eligible applicants should have:

Required Technical Skills

Applicants should demonstrate strong technical ability in data science and AI-related work.

Required or strongly preferred skills include:

South African Language Experience

Experience with low-resource South African languages is required or strongly preferred.

Relevant languages include:

This language experience is important because harmful narratives often use local language, slang, coded terms, slogans, and context-specific expressions.

Social Media Data Collection Experience

Applicants should be familiar with collecting and analysing public data from social media platforms.

Relevant platforms include:

This experience is important for identifying harmful language patterns, coordinated campaigns, and evolving online narratives.

Key Focus Areas

The fellowship focuses on hate speech, incitement, and harmful online narratives.

Key areas include:

Key Concepts Explained

Hate Speech Lexicon

A hate speech lexicon is a structured list of words, phrases, slang, slogans, slurs, and coded language associated with hateful or harmful speech.

Incitement Lexicon

An incitement lexicon focuses on language that may encourage violence, discrimination, hostility, or harm against individuals or groups.

Weaponised Language

Weaponised language refers to words or phrases used deliberately to attack, dehumanise, intimidate, mobilise hostility, or spread harmful narratives.

Natural Language Processing

Natural language processing is a field of artificial intelligence that helps computers analyse, classify, and interpret human language.

Low-Resource Languages

Low-resource languages are languages with limited digital datasets, linguistic tools, or AI training resources. Many South African languages fall into this category.

Machine-Readable Lexicon

A machine-readable lexicon is structured so that computers and AI models can process it for analysis, classification, monitoring, and detection.

Information Integrity

Information integrity refers to the reliability, accuracy, and trustworthiness of public information ecosystems.

Human Rights Standards

The fellowship aligns lexicon development with international human rights principles.

Relevant frameworks include:

This alignment helps ensure that monitoring tools do not overreach or unfairly restrict legitimate expression.

What Fellows Will Do

Fellows will develop structured lexicons that help detect and analyse harmful online language.

Key responsibilities may include:

Expected Outputs

The fellowship will produce resources that support NLP-based monitoring and analysis.

Expected outputs may include:

NLP Model Applications

The lexicons developed through the fellowship will support AI and NLP tools.

These tools may generate indicators such as:

These indicators can support human rights organisations, researchers, civic technology teams, and information integrity monitors.

Ethical Data Practices

Ethical data practices are central to the fellowship.

Fellows must handle data responsibly and avoid harmful misuse of sensitive language datasets.

Ethical considerations include:

How the Fellowship Works

Selected fellows will work part-time over three months.

They will collaborate with Code for Africa’s iLab and TechLab teams.

They will also work alongside human rights organisations to ensure that lexicon development reflects real monitoring needs, local context, and ethical standards.

The fellowship combines technical NLP work, cultural analysis, language research, and human rights-informed data design.

How to Apply

Applicants should prepare an application that demonstrates technical expertise, relevant language experience, and commitment to ethical AI.

Suggested Application Steps

  1. Confirm that you are based in South Africa.
  2. Highlight your Python, NLP, and machine learning experience.
  3. Demonstrate at least two years of data science experience.
  4. Provide examples of work with multilingual or low-resource language datasets.
  5. Mention experience with isiZulu, isiXhosa, Afrikaans, Sesotho, Setswana, or Sepedi.
  6. Describe your social media data collection experience.
  7. Highlight experience with Hugging Face and transformer models.
  8. Explain your understanding of South African cultural and political contexts.
  9. Provide a relevant project portfolio.
  10. Explain your approach to ethical data collection and AI model development.
  11. Submit the application according to Code for Africa’s fellowship requirements.

Selection Criteria

Applications will be assessed based on technical and contextual strength.

Key selection criteria include:

Why It Matters

Hate speech and incitement can spread quickly through online platforms, especially during elections, social tensions, and coordinated influence campaigns.

Structured lexicons help researchers and monitoring teams detect harmful patterns more effectively.

In South Africa, multilingual and culturally specific language makes this work especially important.

By supporting low-resource language NLP, the fellowship helps build more inclusive and locally relevant AI systems for human rights monitoring.

Tips for Strong Applications

A strong application should clearly show both technical ability and contextual understanding.

Applicants should focus on:

Applicants should show that they can work with sensitive language data carefully, accurately, and responsibly.

Common Mistakes to Avoid

Applicants should avoid submitting generic data science applications that do not address the fellowship’s specific focus.

Common mistakes include:

FAQ

What is the Code for Africa Lexicon Fellowship?

It is a three-month, part-time fellowship for NLP and data science experts to develop structured hate speech and incitement lexicons for human rights and information integrity monitoring.

Who can apply?

Applicants must be based in South Africa and have strong skills in Python, NLP, machine learning, and data science.

How long does the fellowship last?

The fellowship lasts three months.

What benefits do fellows receive?

Selected fellows receive a competitive monthly stipend, technical support, research support, mentorship, and access to collaborative networks.

What languages are relevant for this fellowship?

Relevant South African languages include isiZulu, isiXhosa, Afrikaans, Sesotho, Setswana, and Sepedi.

What platforms should applicants be familiar with?

Applicants should be familiar with public data collection from platforms such as X, Facebook, TikTok, YouTube, and Telegram.

What will the lexicons be used for?

The lexicons will support NLP tools that monitor harmful language, track incitement patterns, assess polarisation, and support human rights organisations in analysing online narratives.

Conclusion

The Code for Africa Lexicon Fellowship supports the development of structured, human rights-aligned lexicons for monitoring hate speech, incitement, and harmful online narratives in South Africa. The fellowship is especially important for improving NLP tools in low-resource South African languages and strengthening ethical AI-driven information integrity work.

Strong applicants will combine technical expertise in Python, NLP, and machine learning with cultural knowledge, language skills, ethical data practices, and experience working with social media data. The fellowship offers an opportunity to contribute to practical AI tools that support human rights monitoring and public-interest accountability.

For more information, visit Code for Africa.

Exit mobile version