Get started

Get Data Gov running on your machine in about 30 minutes. You will clone the repo, restore a production database snapshot, and boot the development server.

What you will have at the end

A local Rails server at http://localhost:3000 with the ActiveAdmin dashboard. You can browse drugs, diseases, clinical trials, and all 87 admin resources. You can run Thor pipeline tasks against real data.

Prerequisites

Install these before starting.

Dependency	Version	Purpose
Ruby	3.4.2	Application runtime (manage with rbenv)
PostgreSQL	17+	Primary database (Docker image with pgvector)
Redis	7+	Sidekiq job queue backend
Docker	Latest	Runs Postgres and Redis containers
Node.js	LTS	Shakapacker/React builds
libpq	Latest	PostgreSQL C client library
graphviz	Latest	ERD generation (optional)

You also need credentials for external services.

Credential	Required for	How to get it
AACT database	Clinical trials data	Register at aact.ctti-clinicaltrials.org
Google OAuth	Admin login	Create credentials in Google Cloud Console
OpenAI API key	LLM pipelines	Get from OpenAI dashboard
AWS credentials	S3, Batch, CloudWatch	Optional for local dev — ask your lead

Step 1: Install system dependencies

# Install native libraries
brew install libpq graphviz
export PATH="/opt/homebrew/opt/libpq/bin:$PATH"
echo 'export PATH="/opt/homebrew/opt/libpq/bin:$PATH"' >> ~/.zshrc

# Install rbenv and Ruby
brew install rbenv
rbenv install 3.4.2
rbenv global 3.4.2

# Install Bundler
gem install bundler

Step 2: Clone and install

git clone git@github.com:Bioloupe-Inc/bioloupe-data-gov.git
cd bioloupe-data-gov

# Install Ruby dependencies
bundle install

# Install JavaScript dependencies
pnpm install

Step 3: Configure environment variables

cp env.example .env

Open .env and set these required values.

# Primary database (Docker defaults)
DB_HOST=localhost
DB_NAME=datalake
DB_USERNAME=bioloupe
DB_PASSWORD=bioloupe

# AACT clinical trials database (read-only)
AACT_DB_HOST=aact-db.ctti-clinicaltrials.org
AACT_DB_NAME=aact
AACT_DB_USERNAME=<your-aact-username>
AACT_DB_PASSWORD=<your-aact-password>

# Google OAuth for admin login
GOOGLE_CLIENT_ID=<your-client-id>
GOOGLE_CLIENT_SECRET=<your-client-secret>

# OpenAI for LLM pipelines
OPENAI_API_KEY=<your-key>

The env.example file documents all optional variables: AWS (S3, Batch, Athena), ChEMBL, Cision, FMP, Slack, Brevo, Klaviyo, ASCO, New Relic, and Airbrake. You only need these for specific pipeline features.

Step 4: Start local infrastructure

Docker Compose provides PostgreSQL 17 (with pgvector) and Redis 7.

docker compose up -d

Verify both services are healthy.

docker compose ps

You should see two running containers. Stop them later with docker compose down.

Step 5: Restore the database

Fresh migrations are not supported. The schema has 206 tables and complex interdependencies. Restore from a production dump instead.

bundle exec thor db:restore

This command lists available S3 backups and handles pg_restore automatically.

Step 6: Create your user account

Open a Rails console.

bundle exec rails c

Create an admin user.

PaperTrail.request(enabled: false) do
  User.create(
    email: 'your.name@bioloupe.com',
    name: 'Your Name',
    role: 'admin'
  )
end

Valid roles: admin, editor, viewer, client. Use admin for full access during development. The client role is API-only and cannot access ActiveAdmin.

Step 7: Start the dev server

For backend-only work, one terminal is enough.

bundle exec rails server
# Open http://localhost:3000

The root path redirects to /admin, the ActiveAdmin dashboard.

For full-stack development with React hot reload, use two terminals.

# Terminal 1: Webpack dev server with HMR
bin/shakapacker-dev-server

# Terminal 2: Rails server on a different port
rails s -p 3010
# Open http://localhost:3010

Or use Foreman with bin/dev to run both processes from a single terminal.

Step 8: Verify your setup

Run these checks to confirm everything works.

# Run the test suite
bundle exec rails test

# Check a Thor task works
bundle exec thor regulatory:fda:download_and_extract --help

# Check that the health endpoint responds
curl -s http://localhost:3000/up
# Should return 200

Step 9: Run a pipeline (optional)

Try a single Thor task to see how pipelines work.

bundle exec thor regulatory:fda:download_and_extract

Or launch a full workflow from the Rails console.

WorkflowRunnerJob.perform_now(
  workflow_type: "ClinicalTrialsWorkflow",
  params: {},
  reset_if_exists: true
)

You can also trigger workflows from the ActiveAdmin UI at /admin/workflow_instances.

Next steps

Now that Data Gov is running, read the docs in order.

Data model — Understand the 206-table schema and how entities connect
Clinical trials — Follow a trial from ClinicalTrials.gov into the knowledge graph
Architecture — Learn the codebase patterns before adding features