STEM/Coding Experts Needed: Build Research Tasks for AI Evaluation

Employer

Kamil Lee

Descripción

I need help building realistic, terminal-based STEM research tasks used to evaluate frontier AI models (GPT, Gemini, etc.).

What you'll build:

A self-contained coding task that looks like real research work (analyzing datasets, running simulations, validating hypotheses, comparing methods). Not a textbook problem.

Each submission must include:

instruction.md (workflow, inputs, outputs, success criteria)

Reproducible Docker environment with data

Oracle solution (solve.sh) that fully solves the task

Deterministic tests for verification

task.toml metadata

All packaged into one zip

Quality bar:

Multi-step, research-grade workflow

Hard enough that frontier models fail more than 80% of the time

Oracle passes local tests 3 out of 3 times

Objectively verifiable outputs

No LLM-generated content allowed

Who's a fit:

STEM background (biology, chemistry, physics, ML, data science, etc.) with strong Python and Docker skills.

Payout: $100 per accepted submission.

Published

on 2026-06-03

Offers sent (18)

análisis de datos

Anotación de Datos

API

+ 20 more

canva

css

excel

+ 11 more

chatgpt

Gemini

+ 7 more

NodeJS

python

React

+ 2 more

API

aprendizaje automático

chatgpt

+ 5 more

API

backend

docker

+ 8 more

copywriting

english

RESEARCH

+ 3 more

sedauxtechnologies 0 deals

Senior IBM Db2 Automation / DevOps Engineer – Build Enterprise Database Automation Platform (6–8 Month Project)
Core scope Design the overall solution architecture and modular automation framework. Develop...

75000,00 EUR

20 offers

Expires in 3 days
OrianBetoncur 0 deals

Fullstack React Redux Java Spring Boot Developer
We are looking for a strong **frontend-focused Full Stack Developer** to help build and improve...

200,00 USD

33 offers

Expires in 1 day
Navia 🔥 Automate 🤖 & Apps 📱 0 deals

Automation Developer
Szukamy osoby, która chce tworzyć automatyzacje i aplikacje low-code z wykorzystaniem narzędzi...

Negotiable

66 offers

Expires in 26 days
kamil 0 deals

SKRYPT ALLEGRO
Zlecę stworzenie skryptu, który przez oficjalne API Allegro będzie codziennie wyszukiwał...

Negotiable

70 offers

Expires in 23 days
Gustavo Toscani 0 deals

Building EyeLabX – Join as a Founding Blockchain & Full-Stack Developer
At EyeLabX, we’re building the first eye wellness ecosystem that combines digital health, token...

Negotiable

24 offers

Expires in 16 days