Deadline: 28-Jun-2026
Code for Africa is offering a three-month, part-time Lexicon Fellowship for NLP and data science experts based in South Africa. The fellowship supports the development of structured, machine-readable hate speech and incitement lexicons to improve information integrity, human rights monitoring, and AI-driven analysis of harmful online narratives.
Selected fellows will receive a competitive monthly stipend and technical support from Code for Africa’s iLab and TechLab teams. Applicants should have strong skills in Python, natural language processing, machine learning, low-resource South African languages, social media data collection, and ethical data practices.
What is the Code for Africa Lexicon Fellowship?
The Code for Africa Lexicon Fellowship is a part-time opportunity for NLP specialists, machine learning practitioners, and data science experts.
The fellowship focuses on building structured hate speech and incitement lexicons that can support human rights monitoring and information integrity work.
The programme is designed to help identify and analyse harmful language used in online spaces, especially language linked to hate speech, online violence, dehumanisation, incitement, and coordinated influence campaigns.
Main Purpose of the Fellowship
The main purpose of the fellowship is to strengthen AI-supported monitoring of harmful online narratives in South Africa.
The fellowship supports the development of lexicons that are:
- Structured
- Version-controlled
- Machine-readable
- Multilingual
- Aligned with human rights standards
- Useful for NLP model development
- Designed for ethical monitoring of harmful speech
Fellowship Duration
The fellowship runs for three months.
It is a part-time fellowship.
Fellowship Benefits
Selected fellows will receive:
- A competitive monthly stipend
- Technical support from Code for Africa’s iLab team
- Research support from Code for Africa’s TechLab team
- Access to collaborative networks
- Opportunities to work with human rights organisations
- Practical experience in NLP-based monitoring tools
- Exposure to multilingual and low-resource language AI applications
Geographic Focus
Applicants must be based in South Africa.
The fellowship focuses on South African civic, cultural, political, and linguistic contexts.
Who is Eligible?
The fellowship is open to NLP and data science experts based in South Africa.
Eligible applicants should have:
- Strong Python skills
- Strong NLP experience
- Machine learning experience
- At least two years of data science experience
- Experience with low-resource South African languages
- Familiarity with social media data collection
- Experience with transformer models
- Understanding of ethical data practices
- Knowledge of local cultural and political contexts
Required Technical Skills
Applicants should demonstrate strong technical ability in data science and AI-related work.
Required or strongly preferred skills include:
- Python programming
- Natural language processing
- Machine learning
- Social media data collection
- Multilingual data analysis
- Low-resource language processing
- Hugging Face tools
- Transformer models
- Dataset preparation
- Structured data management
- Ethical AI practices
South African Language Experience
Experience with low-resource South African languages is required or strongly preferred.
Relevant languages include:
- isiZulu
- isiXhosa
- Afrikaans
- Sesotho
- Setswana
- Sepedi
This language experience is important because harmful narratives often use local language, slang, coded terms, slogans, and context-specific expressions.
Social Media Data Collection Experience
Applicants should be familiar with collecting and analysing public data from social media platforms.
Relevant platforms include:
- X
- TikTok
- YouTube
- Telegram
This experience is important for identifying harmful language patterns, coordinated campaigns, and evolving online narratives.
Key Focus Areas
The fellowship focuses on hate speech, incitement, and harmful online narratives.
Key areas include:
- Municipal elections
- Xenophobia
- LGBTQ+ communities
- Gender-based hate
- Online violence
- Dehumanisation
- Incitement patterns
- Coordinated campaigns
- Disinformation vulnerability
- Polarisation analysis
- Information integrity
- Human rights monitoring
Key Concepts Explained
Hate Speech Lexicon
A hate speech lexicon is a structured list of words, phrases, slang, slogans, slurs, and coded language associated with hateful or harmful speech.
Incitement Lexicon
An incitement lexicon focuses on language that may encourage violence, discrimination, hostility, or harm against individuals or groups.
Weaponised Language
Weaponised language refers to words or phrases used deliberately to attack, dehumanise, intimidate, mobilise hostility, or spread harmful narratives.
Natural Language Processing
Natural language processing is a field of artificial intelligence that helps computers analyse, classify, and interpret human language.
Low-Resource Languages
Low-resource languages are languages with limited digital datasets, linguistic tools, or AI training resources. Many South African languages fall into this category.
Machine-Readable Lexicon
A machine-readable lexicon is structured so that computers and AI models can process it for analysis, classification, monitoring, and detection.
Information Integrity
Information integrity refers to the reliability, accuracy, and trustworthiness of public information ecosystems.
Human Rights Standards
The fellowship aligns lexicon development with international human rights principles.
Relevant frameworks include:
- Rabat Plan of Action
- International Covenant on Civil and Political Rights principles
- Human rights-based standards on freedom of expression
- Standards for assessing incitement, discrimination, and harm
This alignment helps ensure that monitoring tools do not overreach or unfairly restrict legitimate expression.
What Fellows Will Do
Fellows will develop structured lexicons that help detect and analyse harmful online language.
Key responsibilities may include:
- Mapping harmful words, slang, slogans, and slurs
- Analysing language linked to hate speech and incitement
- Identifying coded or context-specific harmful terms
- Structuring lexicons for NLP use
- Version-controlling lexicon datasets
- Supporting multilingual language resources
- Reviewing social media content patterns
- Collaborating with human rights organisations
- Supporting indicators used in AI monitoring systems
Expected Outputs
The fellowship will produce resources that support NLP-based monitoring and analysis.
Expected outputs may include:
- Structured hate speech lexicons
- Incitement language lexicons
- Machine-readable language datasets
- Multilingual entries for South African languages
- Metadata for terms and usage contexts
- Version-controlled lexicon files
- Documentation for ethical use
- Inputs for NLP model development
- Indicators for polarisation and disinformation risk
NLP Model Applications
The lexicons developed through the fellowship will support AI and NLP tools.
These tools may generate indicators such as:
- Polarisation scores
- Disinformation vulnerability indices
- Harmful narrative indicators
- Incitement risk signals
- Online violence monitoring insights
- Coordinated campaign markers
These indicators can support human rights organisations, researchers, civic technology teams, and information integrity monitors.
Ethical Data Practices
Ethical data practices are central to the fellowship.
Fellows must handle data responsibly and avoid harmful misuse of sensitive language datasets.
Ethical considerations include:
- Respecting privacy
- Avoiding unnecessary exposure of individuals
- Protecting vulnerable groups
- Applying human rights standards
- Avoiding biased or overbroad classification
- Documenting context clearly
- Ensuring transparency in data methods
- Supporting responsible AI development
How the Fellowship Works
Selected fellows will work part-time over three months.
They will collaborate with Code for Africa’s iLab and TechLab teams.
They will also work alongside human rights organisations to ensure that lexicon development reflects real monitoring needs, local context, and ethical standards.
The fellowship combines technical NLP work, cultural analysis, language research, and human rights-informed data design.
How to Apply
Applicants should prepare an application that demonstrates technical expertise, relevant language experience, and commitment to ethical AI.
Suggested Application Steps
- Confirm that you are based in South Africa.
- Highlight your Python, NLP, and machine learning experience.
- Demonstrate at least two years of data science experience.
- Provide examples of work with multilingual or low-resource language datasets.
- Mention experience with isiZulu, isiXhosa, Afrikaans, Sesotho, Setswana, or Sepedi.
- Describe your social media data collection experience.
- Highlight experience with Hugging Face and transformer models.
- Explain your understanding of South African cultural and political contexts.
- Provide a relevant project portfolio.
- Explain your approach to ethical data collection and AI model development.
- Submit the application according to Code for Africa’s fellowship requirements.
Selection Criteria
Applications will be assessed based on technical and contextual strength.
Key selection criteria include:
- NLP expertise
- Python proficiency
- Machine learning experience
- Data science experience
- Work with multilingual datasets
- Experience with low-resource languages
- Social media data collection skills
- Familiarity with Hugging Face and transformer models
- Quality of previous work
- Understanding of ethical AI
- Ability to build structured lexicons
- Relevance of project portfolio
- Understanding of South African civic and political contexts
Why It Matters
Hate speech and incitement can spread quickly through online platforms, especially during elections, social tensions, and coordinated influence campaigns.
Structured lexicons help researchers and monitoring teams detect harmful patterns more effectively.
In South Africa, multilingual and culturally specific language makes this work especially important.
By supporting low-resource language NLP, the fellowship helps build more inclusive and locally relevant AI systems for human rights monitoring.
Tips for Strong Applications
A strong application should clearly show both technical ability and contextual understanding.
Applicants should focus on:
- Practical NLP project examples
- Experience with South African languages
- Strong Python and machine learning skills
- Familiarity with social media datasets
- Evidence of structured data work
- Understanding of hate speech and incitement risks
- Awareness of human rights standards
- Clear ethical approach
- Relevant portfolio links or examples
- Ability to collaborate with technical and human rights teams
Applicants should show that they can work with sensitive language data carefully, accurately, and responsibly.
Common Mistakes to Avoid
Applicants should avoid submitting generic data science applications that do not address the fellowship’s specific focus.
Common mistakes include:
- Not showing NLP experience
- Failing to demonstrate Python skills
- Ignoring low-resource South African language requirements
- Providing no examples of social media data work
- Not mentioning ethical data practices
- Treating hate speech detection as only a keyword-matching task
- Failing to show cultural or political context knowledge
- Not providing a relevant portfolio
- Ignoring human rights standards
- Overlooking the need for structured, machine-readable outputs
FAQ
What is the Code for Africa Lexicon Fellowship?
It is a three-month, part-time fellowship for NLP and data science experts to develop structured hate speech and incitement lexicons for human rights and information integrity monitoring.
Who can apply?
Applicants must be based in South Africa and have strong skills in Python, NLP, machine learning, and data science.
How long does the fellowship last?
The fellowship lasts three months.
What benefits do fellows receive?
Selected fellows receive a competitive monthly stipend, technical support, research support, mentorship, and access to collaborative networks.
What languages are relevant for this fellowship?
Relevant South African languages include isiZulu, isiXhosa, Afrikaans, Sesotho, Setswana, and Sepedi.
What platforms should applicants be familiar with?
Applicants should be familiar with public data collection from platforms such as X, Facebook, TikTok, YouTube, and Telegram.
What will the lexicons be used for?
The lexicons will support NLP tools that monitor harmful language, track incitement patterns, assess polarisation, and support human rights organisations in analysing online narratives.
Conclusion
The Code for Africa Lexicon Fellowship supports the development of structured, human rights-aligned lexicons for monitoring hate speech, incitement, and harmful online narratives in South Africa. The fellowship is especially important for improving NLP tools in low-resource South African languages and strengthening ethical AI-driven information integrity work.
Strong applicants will combine technical expertise in Python, NLP, and machine learning with cultural knowledge, language skills, ethical data practices, and experience working with social media data. The fellowship offers an opportunity to contribute to practical AI tools that support human rights monitoring and public-interest accountability.
For more information, visit Code for Africa.
