STEM/Coding Experts Needed: Build Research Tasks for AI Evaluation

Empleador
no avatar
Kamil Lee
Descripción

I need help building realistic, terminal-based STEM research tasks used to evaluate frontier AI models (GPT, Gemini, etc.).

What you'll build:

A self-contained coding task that looks like real research work (analyzing datasets, running simulations, validating hypotheses, comparing methods). Not a textbook problem.

Each submission must include:

instruction.md (workflow, inputs, outputs, success criteria)

Reproducible Docker environment with data

Oracle solution (solve.sh) that fully solves the task

Deterministic tests for verification

task.toml metadata

All packaged into one zip

Quality bar:

Multi-step, research-grade workflow

Hard enough that frontier models fail more than 80% of the time

Oracle passes local tests 3 out of 3 times

Objectively verifiable outputs

No LLM-generated content allowed

Who's a fit:

STEM background (biology, chemistry, physics, ML, data science, etc.) with strong Python and Docker skills.

Payout: $100 per accepted submission.

Publicado
hace 2 días
Derechos de autor
Decisión del freelancer

Ofertas enviadas (6)

Presupuesto
100,00 USD
Derechos de autor
Decisión del freelancer
Válido por
30 días

Trabajos recientes desde categoría

  • no avatar
    skaner 0 tratos
    Airtable Consultancy
    # Request for Quotation — Airtable consultancy ## About us We are a rental management company...
    Negociable
    17 ofertas
    Válido por 28 días
  • no avatar
    regansen 32 tratos
    2 PDF / Report Builder w systemie ERP/SaaS
    Poszukujemy doświadczonego Frontend Developera (Angular) do rozbudowy istniejącego modułu Report...
    Negociable
    26 ofertas
    Válido por 14 días