Deadline: 25-Sep-22
The Invest India is accepting applications for Innovation Challenge for Development of Machine Aided Translation System.
The present Innovation Challenge is targeted at involving industry / start-ups to get machine-aided translation systems developed using open-source translation tools and text corpus available in public domain including those developed under TDIL program of MeitY.
Objectives
- The Innovation Challenge aims at the development of useable and scalable text-to-text machine-aided translation systems for English to any Indian language (for which adequate parallel text corpus is available) and vice-versa making use of open-source machine translation platforms and text corpus available in public domain. The participating teams may use public domain Indian language corpus called “Samanantar” and the language resources/models/tools available in the public domain. Samanantar has adequate parallel text corpus (more than 50 lakhs) for English and 8 Indian languages viz. Bengali, Hindi, Kannda, Malayalam, Marathi, Punjabi, Tamil and Telugu.
- In order to customize the system for the domain of the user agency, the participating team will create additional parallel text corpus using the content made available by the user agencies viz. Election Commission, NCERT, Vigyan Prasar. The language-pairs for which parallel content is available from the user agencies in their respective domains will be displayed on the Innovation Challenge webpage before the ideation phase starts. The teams may tweak the existing translation models to improve the performance using innovative ideas of their own and also current best practices.
- Further, the participating teams will also develop/customize necessary tools in the form of a translation workbench to help users to carry out the translation tasks such as making corrections in the machine translated sentences, etc. in a convenient and efficient way. In order to accomplish the task in a limited time frame, the participants are expected to customize/enhance the open-source tools / workbench already available in public domain rather than building such tools from scratch.
Funding Information
- Ideation Stage: Each of the shortlisted teams will be given Rs. 2 lakhs at this stage.
- Prototype Stage: Each of the shortlisted teams will be provided Rs. 12 lakhs.
- Solution Building Stage: Development cost of Rs. 20 lakhs shall be provided post satisfactory completion of the work of this stage.
Stages in the Innovation Challenge
- Ideation Stage
- The participating teams will present their innovative, cutting-edge ideas and approaches for the development of MATS. Up to 10 top teams will be selected at this stage on the basis of the merits and the feasibility of the solution proposed as well as the capacity of the participating team. Each of the shortlisted teams will be given Rs. 2 lakhs at this stage. If the Steering Committee finds that less than ten teams are in the position to develop the MATS, it may recommend less than ten teams.
- Prototype Stage
- The shortlisted teams from Stage-I will work towards the development of the prototype and make a presentation to the Steering Committee. At most, 3 teams will be shortlisted at this stage. Each of the shortlisted teams will be provided Rs. 12 lakhs. At this stage, the broad deliverables are the following:
- Enriching and augmenting the language data available in the public domain.
- System design and architecture, API design, deployment pipeline, telemetry data. Model training, general engineering, glossary/TMX, translation memory, translating a document in end-to-end fashion, translator User Interface (UI) Workbench for proofreading and sentence correction within the UI, exporting translated document.
- The shortlisted teams from Stage-I will work towards the development of the prototype and make a presentation to the Steering Committee. At most, 3 teams will be shortlisted at this stage. Each of the shortlisted teams will be provided Rs. 12 lakhs. At this stage, the broad deliverables are the following:
- Solution Building Stage
- The shortlisted teams from Stage II will work towards building the solution. Development cost of Rs. 20 lakhs shall be provided post satisfactory completion of the work of this stage. At this stage, the broad deliverables are the following:
- Assuming sentence or paragraph translation is delivered in working condition, the system should have a sentence tokenizer for Hindi and English, a UI tool where translators can correct the translated sentence
- An improved translation engine and various issues captured and rectified
- Enhancement made to analyse post-edited sentences, visualization of metrics
- Enhance sentence memory and glossary to control the translation output. Enhancements in TXM and translation memory, exporting translated document.
- The shortlisted teams from Stage II will work towards building the solution. Development cost of Rs. 20 lakhs shall be provided post satisfactory completion of the work of this stage. At this stage, the broad deliverables are the following:
Eligibility Criteria
- An Indian company registered under Companies Act 2013 The term “Indian company” would mean the one with 51% or more shareholding with Indian citizen or persons of Indian origin.
- Startups complying with the definition as per the latest notification of DIPP.
- Entities which are under the process of registration with an undertaking that they will complete the registration by the time of final submission.
For more information, visit https://www.investindia.gov.in/innovation-challenge-for-development-of-machine-aided-translation-system