Call for Applications: Lexicon Fellowship in South Africa

fundsforngos

4 weeks ago

Open Call: Thematic Programme on Human Rights and Democracy in Bosnia and Herzegovina

Deadline: 28-Jun-2026

Code for Africa is offering a three-month, part-time Lexicon Fellowship for NLP and data science experts based in South Africa. The fellowship supports the development of structured, machine-readable hate speech and incitement lexicons to improve information integrity, human rights monitoring, and AI-driven analysis of harmful online narratives.

Selected fellows will receive a competitive monthly stipend and technical support from Code for Africa’s iLab and TechLab teams. Applicants should have strong skills in Python, natural language processing, machine learning, low-resource South African languages, social media data collection, and ethical data practices.

What is the Code for Africa Lexicon Fellowship?

The Code for Africa Lexicon Fellowship is a part-time opportunity for NLP specialists, machine learning practitioners, and data science experts.

The fellowship focuses on building structured hate speech and incitement lexicons that can support human rights monitoring and information integrity work.

The programme is designed to help identify and analyse harmful language used in online spaces, especially language linked to hate speech, online violence, dehumanisation, incitement, and coordinated influence campaigns.

Main Purpose of the Fellowship

The main purpose of the fellowship is to strengthen AI-supported monitoring of harmful online narratives in South Africa.

The fellowship supports the development of lexicons that are:

Structured
Version-controlled
Machine-readable
Multilingual
Aligned with human rights standards
Useful for NLP model development
Designed for ethical monitoring of harmful speech

Fellowship Duration

The fellowship runs for three months.

It is a part-time fellowship.

Fellowship Benefits

Selected fellows will receive:

A competitive monthly stipend
Technical support from Code for Africa’s iLab team
Research support from Code for Africa’s TechLab team
Access to collaborative networks
Opportunities to work with human rights organisations
Practical experience in NLP-based monitoring tools
Exposure to multilingual and low-resource language AI applications

Geographic Focus

Applicants must be based in South Africa.

The fellowship focuses on South African civic, cultural, political, and linguistic contexts.

Who is Eligible?

The fellowship is open to NLP and data science experts based in South Africa.

Eligible applicants should have:

Strong Python skills
Strong NLP experience
Machine learning experience
At least two years of data science experience
Experience with low-resource South African languages
Familiarity with social media data collection
Experience with transformer models
Understanding of ethical data practices
Knowledge of local cultural and political contexts

Required Technical Skills

Applicants should demonstrate strong technical ability in data science and AI-related work.

Required or strongly preferred skills include:

Python programming
Natural language processing
Machine learning
Social media data collection
Multilingual data analysis
Low-resource language processing
Hugging Face tools
Transformer models
Dataset preparation
Structured data management
Ethical AI practices

South African Language Experience

Experience with low-resource South African languages is required or strongly preferred.

Relevant languages include:

isiZulu
isiXhosa
Afrikaans
Sesotho
Setswana
Sepedi

This language experience is important because harmful narratives often use local language, slang, coded terms, slogans, and context-specific expressions.

Social Media Data Collection Experience

Applicants should be familiar with collecting and analysing public data from social media platforms.

Relevant platforms include:

X
Facebook
TikTok
YouTube
Telegram

This experience is important for identifying harmful language patterns, coordinated campaigns, and evolving online narratives.

Key Focus Areas

The fellowship focuses on hate speech, incitement, and harmful online narratives.

Key areas include:

Municipal elections
Xenophobia
LGBTQ+ communities
Gender-based hate
Online violence
Dehumanisation
Incitement patterns
Coordinated campaigns
Disinformation vulnerability
Polarisation analysis
Information integrity
Human rights monitoring

Key Concepts Explained

Hate Speech Lexicon

A hate speech lexicon is a structured list of words, phrases, slang, slogans, slurs, and coded language associated with hateful or harmful speech.

Incitement Lexicon

An incitement lexicon focuses on language that may encourage violence, discrimination, hostility, or harm against individuals or groups.

Weaponised Language

Weaponised language refers to words or phrases used deliberately to attack, dehumanise, intimidate, mobilise hostility, or spread harmful narratives.

Natural Language Processing

Natural language processing is a field of artificial intelligence that helps computers analyse, classify, and interpret human language.

Low-Resource Languages

Low-resource languages are languages with limited digital datasets, linguistic tools, or AI training resources. Many South African languages fall into this category.

Machine-Readable Lexicon

A machine-readable lexicon is structured so that computers and AI models can process it for analysis, classification, monitoring, and detection.

Information Integrity

Information integrity refers to the reliability, accuracy, and trustworthiness of public information ecosystems.

Human Rights Standards

The fellowship aligns lexicon development with international human rights principles.

Relevant frameworks include:

Rabat Plan of Action
International Covenant on Civil and Political Rights principles
Human rights-based standards on freedom of expression
Standards for assessing incitement, discrimination, and harm

This alignment helps ensure that monitoring tools do not overreach or unfairly restrict legitimate expression.

What Fellows Will Do

Fellows will develop structured lexicons that help detect and analyse harmful online language.

Key responsibilities may include:

Mapping harmful words, slang, slogans, and slurs
Analysing language linked to hate speech and incitement
Identifying coded or context-specific harmful terms
Structuring lexicons for NLP use
Version-controlling lexicon datasets
Supporting multilingual language resources
Reviewing social media content patterns
Collaborating with human rights organisations
Supporting indicators used in AI monitoring systems

Expected Outputs

The fellowship will produce resources that support NLP-based monitoring and analysis.

Expected outputs may include:

Structured hate speech lexicons
Incitement language lexicons
Machine-readable language datasets
Multilingual entries for South African languages
Metadata for terms and usage contexts
Version-controlled lexicon files
Documentation for ethical use
Inputs for NLP model development
Indicators for polarisation and disinformation risk

NLP Model Applications

The lexicons developed through the fellowship will support AI and NLP tools.

These tools may generate indicators such as:

Polarisation scores
Disinformation vulnerability indices
Harmful narrative indicators
Incitement risk signals
Online violence monitoring insights
Coordinated campaign markers

These indicators can support human rights organisations, researchers, civic technology teams, and information integrity monitors.

Ethical Data Practices

Ethical data practices are central to the fellowship.

Fellows must handle data responsibly and avoid harmful misuse of sensitive language datasets.

Ethical considerations include:

Respecting privacy
Avoiding unnecessary exposure of individuals
Protecting vulnerable groups
Applying human rights standards
Avoiding biased or overbroad classification
Documenting context clearly
Ensuring transparency in data methods
Supporting responsible AI development

How the Fellowship Works

Selected fellows will work part-time over three months.

They will collaborate with Code for Africa’s iLab and TechLab teams.

They will also work alongside human rights organisations to ensure that lexicon development reflects real monitoring needs, local context, and ethical standards.

The fellowship combines technical NLP work, cultural analysis, language research, and human rights-informed data design.

How to Apply

Applicants should prepare an application that demonstrates technical expertise, relevant language experience, and commitment to ethical AI.

Suggested Application Steps

Confirm that you are based in South Africa.
Highlight your Python, NLP, and machine learning experience.
Demonstrate at least two years of data science experience.
Provide examples of work with multilingual or low-resource language datasets.
Mention experience with isiZulu, isiXhosa, Afrikaans, Sesotho, Setswana, or Sepedi.
Describe your social media data collection experience.
Highlight experience with Hugging Face and transformer models.
Explain your understanding of South African cultural and political contexts.
Provide a relevant project portfolio.
Explain your approach to ethical data collection and AI model development.
Submit the application according to Code for Africa’s fellowship requirements.

Selection Criteria

Applications will be assessed based on technical and contextual strength.

Key selection criteria include:

NLP expertise
Python proficiency
Machine learning experience
Data science experience
Work with multilingual datasets
Experience with low-resource languages
Social media data collection skills
Familiarity with Hugging Face and transformer models
Quality of previous work
Understanding of ethical AI
Ability to build structured lexicons
Relevance of project portfolio
Understanding of South African civic and political contexts

Why It Matters

Hate speech and incitement can spread quickly through online platforms, especially during elections, social tensions, and coordinated influence campaigns.

Structured lexicons help researchers and monitoring teams detect harmful patterns more effectively.

In South Africa, multilingual and culturally specific language makes this work especially important.

By supporting low-resource language NLP, the fellowship helps build more inclusive and locally relevant AI systems for human rights monitoring.

Tips for Strong Applications

A strong application should clearly show both technical ability and contextual understanding.

Applicants should focus on:

Practical NLP project examples
Experience with South African languages
Strong Python and machine learning skills
Familiarity with social media datasets
Evidence of structured data work
Understanding of hate speech and incitement risks
Awareness of human rights standards
Clear ethical approach
Relevant portfolio links or examples
Ability to collaborate with technical and human rights teams

Applicants should show that they can work with sensitive language data carefully, accurately, and responsibly.

Common Mistakes to Avoid

Applicants should avoid submitting generic data science applications that do not address the fellowship’s specific focus.

Common mistakes include:

Not showing NLP experience
Failing to demonstrate Python skills
Ignoring low-resource South African language requirements
Providing no examples of social media data work
Not mentioning ethical data practices
Treating hate speech detection as only a keyword-matching task
Failing to show cultural or political context knowledge
Not providing a relevant portfolio
Ignoring human rights standards
Overlooking the need for structured, machine-readable outputs

FAQ

What is the Code for Africa Lexicon Fellowship?

It is a three-month, part-time fellowship for NLP and data science experts to develop structured hate speech and incitement lexicons for human rights and information integrity monitoring.

Who can apply?

Applicants must be based in South Africa and have strong skills in Python, NLP, machine learning, and data science.

How long does the fellowship last?

The fellowship lasts three months.

What benefits do fellows receive?

Selected fellows receive a competitive monthly stipend, technical support, research support, mentorship, and access to collaborative networks.

What languages are relevant for this fellowship?

Relevant South African languages include isiZulu, isiXhosa, Afrikaans, Sesotho, Setswana, and Sepedi.

What platforms should applicants be familiar with?

Applicants should be familiar with public data collection from platforms such as X, Facebook, TikTok, YouTube, and Telegram.

What will the lexicons be used for?

The lexicons will support NLP tools that monitor harmful language, track incitement patterns, assess polarisation, and support human rights organisations in analysing online narratives.

Conclusion

The Code for Africa Lexicon Fellowship supports the development of structured, human rights-aligned lexicons for monitoring hate speech, incitement, and harmful online narratives in South Africa. The fellowship is especially important for improving NLP tools in low-resource South African languages and strengthening ethical AI-driven information integrity work.

Strong applicants will combine technical expertise in Python, NLP, and machine learning with cultural knowledge, language skills, ethical data practices, and experience working with social media data. The fellowship offers an opportunity to contribute to practical AI tools that support human rights monitoring and public-interest accountability.

For more information, visit Code for Africa.