← qsimeon.github.io
🏆  BrainStorm 2026 · 1st Place · Track 1

How We Won a BCI Hackathon:
Decoding Brain Signals on the Edge

Team MindMeld built a real-time neural decoder for 1024-channel ECoG recordings under strict edge hardware constraints — and finished 22.5 points ahead of second place.

Quilee Simeon · January 24, 2026 · ~12 min read · github ↗
91.7
Total Score
out of 100
94.2%
Balanced Acc.
47.1 / 50 pts
<5ms
Latency
23.8 / 25 pts
~0.2 MB
Model Size
20.8 / 25 pts
Team MindMeld at BrainStorm 2026
// Team MindMeld at BrainStorm 2026, Microsoft NERD Center, Boston

Real-Time Neural Decoding Under Hardware Constraints

BrainStorm 2026 was hosted by Precision Neuroscience, the company behind the Layer7 micro-ECoG array — a 1024-electrode cortical surface implant already in clinical trials. Track 1 put that hardware front and center.

The problem statement: given a continuous stream of voltage recordings from 1024 electrodes implanted over an animal's auditory cortex, classify what sound frequency the subject is hearing — one sample at a time, in real time.

BrainStorm 2026 track overview
// Opening slide: Track 1 was the ML + DSP track

The challenge is deeper than just classification. Real BCIs run on edge hardware — implanted or wearable devices with strict limits on power, memory, and compute. A server-grade model is useless next to the brain. This creates the core tension: bigger models are typically more accurate, but accuracy is only half the score.

The Scoring Formula

The final score is a composite of three exponentially-penalized metrics. The non-linearity is intentional: it aggressively rewards ultra-compact, ultra-fast models while allowing diminishing returns at the high end.

// Scoring decomposition — total out of 100
Metric Weight Formula What it rewards
Balanced Accuracy 50 pts bal_acc × 50 Equal recall across all classes
Prediction Lag 25 pts exp(−6 × lag_ms / 500) × 25 Sub-10ms detection of stimulus onset
Model Size 25 pts exp(−4 × size_mb / 5) × 25 Compact enough for embedded hardware

Note: exponential penalty on lag and size is steep. A 5 MB model instead of 1 MB costs ~12 points. A 50ms lag instead of 10ms costs ~8 points.

There is also a hard causality constraint: real BCIs cannot see the future. The evaluation harness feeds data sequentially, one sample at a time. Your model may maintain a history buffer but cannot use future data points or bidirectional filters. Any approach that looks ahead is automatically disqualified.


90 Seconds of Brain Activity, 16:1 Class Imbalance

The training set is a (90386, 1024) parquet file — 90,386 millisecond timesteps of float32 voltage across 1024 electrodes. The validation set adds another ~22,000 samples. By ML standards, this is a very small dataset.

Nine target classes represent the frequency of the presented auditory tone in Hz, plus silence (0 Hz):

// Class distribution — training set (90,386 samples)
70% 40% 20% 10% 67% 0 Hz (silence) 4.1% 120 4.1% 224 4.1% 421 4.3% 789 4.3% 1479 4.3% 2772 4.1% 5195 4.1% 9736 (aliased) ⚠ 16.5:1 imbalance between silence and each tone class

The silence class dominates at 67% of all samples. A model that predicts silence everywhere achieves 67% raw accuracy but only 11% balanced accuracy — the metric that actually matters for scoring. Naive training without addressing this imbalance will collapse to a silent predictor.

One signal processing detail worth knowing: the sampling rate is 1000 Hz, so the Nyquist frequency is 500 Hz. The highest stimulus at 9736 Hz aliases into the recordable band at ~264 Hz. The model doesn't need to "know" the physical frequency — it just needs to learn the distinct pattern of cortical activity each stimulus produces. But it means all frequency discrimination happens within 0–500 Hz, and 93.9% of total signal power sits below 30 Hz (local field potential oscillations dominate).

Signal characteristics: Mean voltage −0.23 µV, std 112.66 µV. ~250 of the 1024 channels account for 50% of total power — the array is spatially sparse. This sparsity is important: it means most channels carry redundant or noise-dominated signal, which motivates aggressive dimensionality reduction before the model sees anything.

PCA + EEGNet + a Large Context Window

Our winning pipeline has three components that interact in a specific way: PCA compresses 1024 channels down to 32 while filtering noise; EEGNet classifies the compressed, windowed signal efficiently; and a 1.6-second context window gives the model enough temporal history to decode sustained auditory responses reliably. Each component was chosen to optimize all three scoring dimensions simultaneously, not just accuracy.

// Inference pipeline — one sample at a time
Raw ECoG Input
new voltage sample arrives every 1ms
(1024,)
PCA Projection
single matrix multiply — pre-fit on training data
(32,)
Causal Sliding Window Buffer
append new sample, drop oldest — maintains past 1.6 seconds
(32, 1600)
EEGNet Forward Pass
temporal conv → depthwise spatial → separable → classify
(9,) logits
Prediction
argmax → map to Hz class
scalar

Step 1: PCA Channel Reduction (1024 → 32)

High-density electrode arrays like the Layer7 record from 1024 channels simultaneously, but not all channels are equally informative. On this dataset, ~250 channels account for 50% of signal power; 600 channels account for 80%. The rest is noise and correlated redundancy.

We fit PCA on the training data and project down to 32 principal components. This choice serves double duty. First, it dramatically shrinks the model: downstream weight matrices are 32-wide rather than 1024-wide. Second, it acts as a structured denoising filter — the top 32 PCs capture the most consistent covariance across the array, discarding the noise-dominated tail. Fitting takes seconds; at inference it's a single matrix multiply with negligible latency cost.

// brainstorm/ml/channel_projection.py
class PCAProjection:
    def fit(self, X: np.ndarray) -> Self:
        # X shape: (n_samples, 1024)
        self.mean_ = X.mean(axis=0)
        centered = X - self.mean_
        _, _, Vt = np.linalg.svd(centered, full_matrices=False)
        self.components_ = Vt[:self.n_components]  # (32, 1024)
        return self

    def transform(self, x: np.ndarray) -> np.ndarray:
        # At inference: single matrix multiply — (1024,) → (32,)
        return (x - self.mean_) @ self.components_.T

Step 2: EEGNet Architecture

With 90 seconds of training data and tight size constraints, we needed an architecture with strong inductive biases for ECoG — one that wouldn't waste parameters on patterns that can't be learned from this small a dataset. EEGNet (Lawhern et al., 2018) was designed exactly for this regime.

The key idea is factoring the spatiotemporal convolution into two explicit stages: a temporal filter that learns when neural features occur, followed by a depthwise spatial filter that learns which channel combinations matter. Depthwise separability avoids the parameter explosion of a full spatiotemporal convolution. The architecture is parameter-efficient by design.

// EEGNet architecture — input shape (batch, 1, 32 channels, 1600 samples)
Block 1 — Temporal Convolution
Conv2d(1→8, kernel=(1,800)) + BatchNorm — learns spectrotemporal patterns
(B, 8, 32, 1600)
Block 2 — Depthwise Spatial Conv
Conv2d(8→16, kernel=(32,1), groups=8) — learns spatial channel combinations
(B, 16, 1, 1600)
↓ AvgPool(1,4)
Block 3 — Separable Convolution
Depthwise + Pointwise Conv2d — combines features across channels
(B, 16, 1, 400)
↓ AvgPool(1,8)
Classifier
Flatten → Linear(flat_size → 9)
(B, 9)
// brainstorm/ml/eegnet.py — core forward pass
def forward(self, x: torch.Tensor) -> torch.Tensor:
    # x: (batch, 1, n_channels=32, window=1600)

    # Block 1: temporal patterns
    x = self.conv1(x)      # (B, F1=8, 32, 1600)
    x = self.bn1(x)

    # Block 2: spatial channel combinations (depthwise)
    x = self.depthwise(x)  # (B, F1*D=16, 1, 1600)
    x = self.bn2(x)
    x = F.elu(x)
    x = self.pool1(x)      # (B, 16, 1, 400) — aggressive downsampling
    x = self.dropout1(x)

    # Block 3: separable convolution
    x = self.separable1(x) # depthwise
    x = self.separable2(x) # pointwise
    x = self.bn3(x)
    x = F.elu(x)
    x = self.pool2(x)      # (B, F2=16, 1, ~50)
    x = self.dropout2(x)

    x = x.flatten(start_dim=1)
    return self.fc(x)       # (B, 9)

Step 3: Training Strategy

Class-weighted loss. With a 16.5:1 silence-to-tone ratio, standard cross-entropy collapses. We weight the loss inversely proportional to class frequency, which forces the model to treat a correct prediction on a rare tone class as equally important as predicting silence correctly. This single change had an outsized impact on balanced accuracy.

# Inverse frequency class weights
class_counts = np.bincount([label_to_idx[l] for l in labels])
weights = 1.0 / class_counts
weights = weights / weights.sum() * len(weights)
criterion = nn.CrossEntropyLoss(
    weight=torch.tensor(weights, dtype=torch.float32).to(device)
)

Train on the full dataset for final submission. After selecting the best configuration on validation, we retrained on combined train + validation data for an additional 45 epochs. With this small a dataset, every example counts.

Final hyperparameters:

ParameterValueWhy
projected_channels32Optimal size/accuracy tradeoff
window_size16001.6 seconds of causal context — see below
F1 (temporal filters)8EEGNet default
D (depthwise multiplier)2EEGNet default
dropout0.25Regularization on small dataset
batch_size64Stable gradient estimates
epochs30 + 45Validation tuning + full retrain

Window Size Is Cheap. History Is Valuable.

This was the single most impactful finding from the entire hackathon, and it runs counter to the intuition most people bring to streaming inference.

The common assumption: a larger input window means higher latency. If you're processing one sample every millisecond, a 1600-sample window seems slow. Most teams used windows of 50–128ms. We tried 1600ms (1.6 seconds).

// Window size vs. performance — EEGNet, 32 PCA channels
Window Size Balanced Accuracy Accuracy Score Inference Latency Delta
128ms (128 samples) ~67% ~33.5 / 50 <1ms baseline
1600ms (1600 samples) ~94% 47.1 / 50 <1ms +20% accuracy, same latency

Same latency. Twenty-seven percentage points more accuracy. How?

The answer is in EEGNet's architecture. The average pooling layers in Blocks 2 and 3 decimate the time dimension aggressively early in the network:

Input window:   1600 samples
After Pool1:     400 samples  (÷4)
After Pool2:      ~50 samples (÷8)
→ Classifier sees a 50-sample representation regardless of input window length

By the time the input reaches the expensive operations — the separable convolution and the linear classifier — a 1600-sample window has been compressed to ~50 samples. The forward pass cost is dominated by the initial temporal convolution and the final linear layer, neither of which scales strongly with input length.

The PCA projection runs in constant time. The ring buffer update is O(1). The latency bottleneck is memory bandwidth and a handful of fixed-size matrix operations — not window size.

The biological reason this works: auditory cortex responses to sustained tones are not sharp spikes at onset. They are slow-building oscillatory patterns — synchronized low-frequency activity that takes hundreds of milliseconds to develop and reach steady state. A 128ms window catches only the onset transient. A 1600ms window captures the full sustained response, making the nine tone classes far more distinguishable.

Generalizable principle: In streaming BCI, always profile your actual latency before assuming window size is a bottleneck. For convolutional architectures with pooling, the cost scales far less than linearly with input length. History is nearly free; use it.

Final Leaderboard: 91.7 / 100, 22.5 Points Ahead

Ten teams competed. Our final submission on January 24 scored 91.7 — a 22.5-point gap over second place. The accuracy component alone (47.1/50) accounted for most of the margin; our balanced accuracy of 94.2% was nearly double the second-place team's accuracy score.

// BrainStorm 2026 Track 1 — Final Leaderboard
RNK TEAM SCORE ACC LAT SIZE
🥇 1
mindmeld
91.7
47.1
23.8
20.8
🥈 2
synapse
69.2
30.9
15.4
23.0
🥉 3
brainhz
67.2
32.4
15.6
19.2
4
forehead
61.9
26.4
15.2
20.4
5
brainrot-computer-interface
59.2
22.4
15.1
21.7
6–10
remaining teams
≤59

Looking at the score breakdown: our accuracy advantage was the dominant factor (+16.2 pts over 2nd place on accuracy alone). Latency and size were more competitive across the field — most teams got inference under 20ms, but nobody matched our accuracy-to-size ratio. The 1600ms window was our differentiator.


Code

The full codebase is public on GitHub. The repo is the fork of the official BrainStorm starter template, with our EEGNet implementation, training scripts, and a benchmark utility that computes all three scoring dimensions against any checkpoint.

// clone and install
git clone https://github.com/qsimeon/brainstorm-track1-public
cd brainstorm-track1-public
make install           # sets up uv venv + dependencies + git hooks
// train the winning model
uv run python examples/train_eegnet.py \
    --window-size 1600 \
    --projected-channels 32 \
    --batch-size 64 \
    --epochs 30
// benchmark all three scoring dimensions
uv run python examples/benchmark_eegnet.py
# outputs: balanced_accuracy, lag_ms, size_mb, and all three sub-scores
// streaming inference — one sample at a time
model = EEGNet.load()      # loads model.pt from repo root
model.reset()              # clear history buffer for new session

for sample in stream:      # sample shape: (1024,)
    prediction = model.predict(sample)
    # prediction: scalar Hz class (0, 120, 224, ... 9736)

What BCI Engineers Should Know

1

Temporal context is cheap — profile before you assume otherwise

Don't guess that a larger window adds latency. For convolutional architectures with pooling, the cost scales far less than linearly. We went from 128ms to 1600ms with no measurable latency increase. This asymmetry — history is nearly free but dramatically improves accuracy — holds broadly for streaming neural data.

2

PCA is underrated for high-density arrays

In micro-ECoG recordings, most structured signal lives in a low-dimensional subspace. PCA compression simultaneously reduces model file size, speeds up all downstream computation, and improves generalization by discarding noise-dominated directions. It isn't just a preprocessing trick — it's a core part of the scoring strategy.

3

Class-weighted loss is non-negotiable on imbalanced neural data

67% silence means an unweighted model learns to predict silence. Track balanced accuracy from epoch 1, not just raw accuracy. Weight your loss. The two metrics can diverge by 60+ percentage points on a dataset like this.

4

Domain-specific architectures beat general ones in small-data regimes

EEGNet's inductive biases — explicit temporal-then-spatial factorization, aggressive pooling — are directly matched to how ECoG data is structured. With 90 seconds of training data, this match matters more than raw model capacity. Transformers, LSTMs, and other general-purpose architectures need substantially more data to overcome the lack of prior structure.

5

Multi-objective scoring requires explicit joint tracking from the start

It's easy to over-optimize accuracy and discover you've blown the size budget with an hour to go. Build a benchmark script on day one that reports all three sub-scores for every run. The scoring formulas are public — use them during development, not just at submission time.


Team MindMeld — BrainStorm 2026, Track 1: Neural Decoder Challenge

Quilee Simeon  ·  Dennis Loevlie  ·  Raghav Gali  ·  Rohan Bhatane  ·  Shravankumar Janawade  ·  Sriram G.

Hosted by Precision Neuroscience at Microsoft NERD Center, Boston · January 23–24, 2026 · 10 competing teams