Read Anthropic’s case study about Graphite Reviewer

Building your own GitHub analytics dashboard

Sara Verdi
Sara Verdi
Graphite software engineer

With today's data-driven development cycles, it's important to really understand your teams' workflows and other key project metrics. While GitHub provides built-in tools like the GitHub Insights tab, building a custom analytics dashboard allows you to tailor insights to your specific needs. This guide will walk you through the process of creating your own GitHub analytics dashboard, so you can visualize and analyze repository data in a way that best suits your team.

Creating a custom GitHub analytics dashboard offers several advantages:

  • Customized metrics: Focus on the data that matters most to your project.
  • Enhanced visualization: Use specific charts and graphs that align with your analysis goals.
  • Integrations: Combine GitHub data with other tools and platforms for a comprehensive view.
  • Automation: Streamline reporting and monitoring processes.

Before starting, you need to make sure you have:

  • Basic programming knowledge: More specifically, familiarity with languages like Python or JavaScript.
  • A GitHub account: This is for access to the repositories you wish to analyze.
  • GitHub personal access token: For authenticated API requests.
  • Development environment: Tools like Node.js, Python, or data visualization libraries.

To fetch data from GitHub, you'll need to interact with their APIs. So, first, you'll need to generate a personal access token and choose your API.

  1. Log into GitHub: Navigate to your profile settings.
  2. Access developer settings: Find "Developer settings" in the sidebar.
  3. Create a new token: Under "Personal access tokens," click "Generate new token."
  4. Set permissions: Select scopes like repo, read:org, and user.
  5. Save the token: Copy and securely store your token.
  • REST API (Easier to use for straightforward data fetching).
  • GraphQL API (More efficient for complex queries and fetching nested data).

Identify the key metrics you want to track:

  • Commit activity: Frequency and volume of commits.
  • Pull requests: Status, merge times, and review comments.
  • Issues: Open vs. closed, resolution times, and labels.
  • Contributors: Individual contributions and activity levels.
  • Code frequency: Lines of code added or removed over time.
Terminal
import requests
# GitHub API base URL
BASE_URL = 'https://api.github.com'
# Your repository details
OWNER = 'your-username'
REPO = 'your-repository'
# Headers with authentication
headers = {
'Authorization': 'token YOUR_PERSONAL_ACCESS_TOKEN',
'Accept': 'application/vnd.github.v3+json'
}
# Fetch commits
def get_commits():
url = f'{BASE_URL}/repos/{OWNER}/{REPO}/commits'
response = requests.get(url, headers=headers)
return response.json()
# Fetch pull requests
def get_pull_requests():
url = f'{BASE_URL}/repos/{OWNER}/{REPO}/pulls?state=all'
response = requests.get(url, headers=headers)
return response.json()
# Fetch issues
def get_issues():
url = f'{BASE_URL}/repos/{OWNER}/{REPO}/issues?state=all'
response = requests.get(url, headers=headers)
return response.json()

GitHub paginates results. Loop through pages to collect all data.

Terminal
def fetch_all_pages(url):
results = []
while url:
response = requests.get(url, headers=headers)
results.extend(response.json())
# Check for 'next' page
if 'next' in response.links:
url = response.links['next']['url']
else:
url = None
return results

Use data processing libraries to manipulate and analyze the fetched data.

Terminal
import pandas as pd
from datetime import datetime
commits = fetch_all_pages(f'{BASE_URL}/repos/{OWNER}/{REPO}/commits')
commit_dates = [commit['commit']['author']['date'] for commit in commits]
commit_dates = [datetime.strptime(date, '%Y-%m-%dT%H:%M:%SZ') for date in commit_dates]
df_commits = pd.DataFrame({'date': commit_dates})
df_commits['day'] = df_commits['date'].dt.date
commits_per_day = df_commits.groupby('day').size().reset_index(name='commits')
Terminal
prs = fetch_all_pages(f'{BASE_URL}/repos/{OWNER}/{REPO}/pulls?state=all')
pr_data = []
for pr in prs:
pr_data.append({
'id': pr['id'],
'state': pr['state'],
'created_at': pr['created_at'],
'merged_at': pr['merged_at']
})
df_prs = pd.DataFrame(pr_data)
# Calculate time to merge
df_prs['created_at'] = pd.to_datetime(df_prs['created_at'])
df_prs['merged_at'] = pd.to_datetime(df_prs['merged_at'])
df_prs['time_to_merge'] = df_prs['merged_at'] - df_prs['created_at']

Choose a visualization library:

  • Matplotlib or Seaborn: For static images.
  • Plotly or Bokeh: For interactive charts.
  • D3.js: For web-based visualizations.
Terminal
import plotly.express as px
# Commits per day
fig = px.bar(commits_per_day, x='day', y='commits', title='Commits Per Day')
fig.show()
# Pull request merge times
fig = px.histogram(df_prs.dropna(), x='time_to_merge', title='PR Time to Merge')
fig.show()

Decide on the platform:

  • Web dashboard: Use frameworks like Dash (Python), React (JavaScript), or Angular.
  • Desktop application: Use Electron or PyQt.
  • Notebook: Jupyter Notebook for a quick setup.
Terminal
import dash
from dash import html, dcc
import plotly.express as px
app = dash.Dash(__name__)
app.layout = html.Div(children=[
html.H1(children='GitHub Analytics Dashboard'),
dcc.Graph(
id='commits-per-day',
figure=fig
),
# Add more graphs as needed
])
if __name__ == '__main__':
app.run_server(debug=True)
  • Filters: Allow users to filter data by date ranges, contributors, or labels.
  • Real-time updates: Use websockets or periodic refreshes to display the latest data.
  • User authentication: Secure your dashboard if it contains sensitive data.

Integrate data from other APIs:

  • Jira or Trello: For project management metrics.
  • Jenkins or Travis CI: For build and deployment statuses.
  • Graphite: For code review metrics

Set up automated scripts to refresh data:

  • Cron Jobs: Schedule scripts on Unix-based systems.
  • Task Schedulers: Use Windows Task Scheduler or cloud-based schedulers.
  • CI/CD Pipelines: Integrate data updates into your existing pipelines.

Choose a hosting solution:

  • Cloud platforms: AWS Elastic Beanstalk, Heroku, or Azure App Service.
  • Containers: Use Docker to containerize your application.
  • On-premises servers: Deploy within your organization's network for added security.
Terminal
# Dockerfile
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["python", "app.py"]
  • Logging: Implement logging to track errors and usage patterns.
  • Performance optimization: Cache data and optimize queries to improve load times.
  • User feedback: Collect feedback from your teams to improve dashboard features.

Building your own GitHub analytics dashboard empowers you to gain deeper insights into your projects. Through leveraging GitHub's APIs and modern visualization tools, you can create a customized GitHub statistics dashboard that enhances team performance and project management. By following this guide, you'll be well on your way to creating a tailored analytics solution that fits your team's unique needs. Whether you're tracking commits, pull requests, or contributor activity, a custom dashboard provides the flexibility and depth that generic tools may lack.

Graphite
Git stacked on GitHub

Stacked pull requests are easier to read, easier to write, and easier to manage.
Teams that stack ship better software, faster.

Or install our CLI.
Product Screenshot 1
Product Screenshot 2