Senior Data Analyst (GenAI)

Labs Remote Full-time

You will own analytics, evaluation, and measurement across the Cherith GRTI Search Platform, centred-set profiling and recommendation systems, and the Christian LLM initiative. That means designing and maintaining evaluation frameworks for answer quality and faithfulness, building dashboards and reports that partners trust, and partnering with the AI Lead and Senior Full Stack Engineer to turn data into actionable improvements.

Who We Are

We apply cutting edge technology in AI, Machine Learning and Natural Language Understanding to Faith, Discipleship and Christian Living, backed by rigorous measurement and responsible data practices. We build useful applications for personal spiritual growth, designing tools to enhance faith communities such as small groups or churches, and offering our expertise to international mission organizations.

Our AI Initiative

Our AI mission is to place faithful AI in the hands of trusted pastors and ministries, so that their impact will be broader and deeper. To continue achieving this, we are developing and researching unique projects where trust, accuracy, and governance are first-class requirements:

A platform to support the GRTI network of ministries and organizations. This platform is a central, searchable library of biblical resources, alongside integrations that bring search into GRTI partner applications and address their core pain points. We will also provide analytics so partners can understand usage, content coverage, and quality over time.
A mature profiling algorithm and recommendation system to understand user behaviour and ensure the technical system is aiding spiritual growth rather than detracting from it, with a centred-set approach to profiling and careful safeguards.
Fine-tuning a Christian LLM to further enhance our AI efforts and improve accuracy within our existing and future systems, with strong data stewardship and clear evaluation standards.

About the Role

You are a rigorous analyst who cares about validity, reproducibility, and clear communication of uncertainty. You prefer automated, versioned evaluation pipelines over ad-hoc notebook work, and you design metrics that protect user privacy while giving partners confidence in the system.

Outcomes You Will Drive in the First 12 Months

Deliver a reproducible evaluation framework for answer quality and faithfulness (automated metrics, human-in-the-loop sampling, and regression tests) integrated into CI and release gates.
Ship partner-facing dashboards and reports (usage, content coverage, quality trends, and onboarding health) with clear definitions, caveats, and documented limitations.
Establish a measurement plan for centred-set profiling and recommendations, including baseline metrics, experiment design, and privacy-preserving reporting.
Support the Christian LLM initiative with clean data workflows, evaluation datasets, and clear metrics for fine-tuning progress and release readiness.
Maintain data quality, lineage, and documentation so partners and internal teams can audit and reproduce findings.

What You Will Do

Design and maintain evaluation datasets and automated metrics for retrieval quality, answer relevance, faithfulness, and safety, with versioning and clear documentation.
Build and maintain dashboards and self-service reporting for partners and internal teams, including usage, quality trends, content coverage, and system health.
Partner with the AI Lead to define evaluation protocols, run monthly evaluation cycles, and gate releases based on clear quality thresholds.
Partner with the Senior Full Stack Engineer to instrument events, build reliable data pipelines, and ensure analytics respect partner isolation and privacy constraints.
Design and analyze experiments for search improvements, profiling, and recommendations, with pre-registered plans and transparent reporting of results and limitations.
Support the Christian LLM initiative with dataset hygiene, evaluation harnesses, and progress tracking against clear benchmarks.
Document metrics, definitions, and known limitations so partners and auditors can understand and trust what they see.

What You Will Bring

5 or more years of experience in data analytics, data science, or evaluation roles with production systems.
Strong statistical reasoning, including experiment design, sampling, and clear communication of uncertainty and limitations.
Proficiency in Python and SQL, plus experience with data pipelines, versioned datasets, and reproducible workflows.
Experience building and maintaining dashboards and reports that non-technical stakeholders trust and use.
Rigorous approach to data quality, lineage, and documentation, with comfort working in cloud environments (GCP, AWS, Azure).
Strong written and verbal communication, with the ability to explain technical trade-offs and metric caveats clearly.
Security and privacy awareness, including experience with multi-tenant data isolation and responsible data handling.

Nice to Have

Experience with LLM evaluation techniques (for example, LLM-as-judge, human preference collection, faithfulness metrics like groundedness or citation accuracy).
Familiarity with search and retrieval metrics (for example, NDCG, MRR, recall@k) and RAG-specific evaluation patterns.
Background in recommendation systems or user modeling, including privacy-preserving approaches.
Experience with MLOps or LLMOps practices, including dataset versioning, model registries, and evaluation automation.
Experience supporting fine-tuning or pre-training workflows with clean data practices and rigorous evaluation.
Background in social science research methods, psychometrics, or measurement in sensitive domains.
Bachelor's degree in a quantitative field or equivalent practical experience.

Location and Travel

U.S.-based remote. Possible minimal travel for team onsites or partner sessions.

Interested in this role?

Send us a note with a bit about yourself and why this role caught your eye. No formal cover letter required.

Apply via Email

View all open roles →