Team MindMeld built a real-time neural decoder for 1024-channel ECoG recordings under strict edge hardware constraints — and finished 22.5 points ahead of second place.
BrainStorm 2026 was hosted by Precision Neuroscience, the company behind the Layer7 micro-ECoG array — a 1024-electrode cortical surface implant already in clinical trials. Track 1 put that hardware front and center.
The problem statement: given a continuous stream of voltage recordings from 1024 electrodes implanted over an animal's auditory cortex, classify what sound frequency the subject is hearing — one sample at a time, in real time.
The challenge is deeper than just classification. Real BCIs run on edge hardware — implanted or wearable devices with strict limits on power, memory, and compute. A server-grade model is useless next to the brain. This creates the core tension: bigger models are typically more accurate, but accuracy is only half the score.
The final score is a composite of three exponentially-penalized metrics. The non-linearity is intentional: it aggressively rewards ultra-compact, ultra-fast models while allowing diminishing returns at the high end.
| Metric | Weight | Formula | What it rewards |
|---|---|---|---|
| Balanced Accuracy | 50 pts | bal_acc × 50 | Equal recall across all classes |
| Prediction Lag | 25 pts | exp(−6 × lag_ms / 500) × 25 | Sub-10ms detection of stimulus onset |
| Model Size | 25 pts | exp(−4 × size_mb / 5) × 25 | Compact enough for embedded hardware |
Note: exponential penalty on lag and size is steep. A 5 MB model instead of 1 MB costs ~12 points. A 50ms lag instead of 10ms costs ~8 points.
There is also a hard causality constraint: real BCIs cannot see the future. The evaluation harness feeds data sequentially, one sample at a time. Your model may maintain a history buffer but cannot use future data points or bidirectional filters. Any approach that looks ahead is automatically disqualified.
The training set is a (90386, 1024) parquet file — 90,386 millisecond timesteps
of float32 voltage across 1024 electrodes. The validation set adds another ~22,000 samples.
By ML standards, this is a very small dataset.
Nine target classes represent the frequency of the presented auditory tone in Hz, plus silence (0 Hz):
The silence class dominates at 67% of all samples. A model that predicts silence everywhere achieves 67% raw accuracy but only 11% balanced accuracy — the metric that actually matters for scoring. Naive training without addressing this imbalance will collapse to a silent predictor.
One signal processing detail worth knowing: the sampling rate is 1000 Hz, so the Nyquist frequency is 500 Hz. The highest stimulus at 9736 Hz aliases into the recordable band at ~264 Hz. The model doesn't need to "know" the physical frequency — it just needs to learn the distinct pattern of cortical activity each stimulus produces. But it means all frequency discrimination happens within 0–500 Hz, and 93.9% of total signal power sits below 30 Hz (local field potential oscillations dominate).
Our winning pipeline has three components that interact in a specific way: PCA compresses 1024 channels down to 32 while filtering noise; EEGNet classifies the compressed, windowed signal efficiently; and a 1.6-second context window gives the model enough temporal history to decode sustained auditory responses reliably. Each component was chosen to optimize all three scoring dimensions simultaneously, not just accuracy.
High-density electrode arrays like the Layer7 record from 1024 channels simultaneously, but not all channels are equally informative. On this dataset, ~250 channels account for 50% of signal power; 600 channels account for 80%. The rest is noise and correlated redundancy.
We fit PCA on the training data and project down to 32 principal components. This choice serves double duty. First, it dramatically shrinks the model: downstream weight matrices are 32-wide rather than 1024-wide. Second, it acts as a structured denoising filter — the top 32 PCs capture the most consistent covariance across the array, discarding the noise-dominated tail. Fitting takes seconds; at inference it's a single matrix multiply with negligible latency cost.
class PCAProjection:
def fit(self, X: np.ndarray) -> Self:
# X shape: (n_samples, 1024)
self.mean_ = X.mean(axis=0)
centered = X - self.mean_
_, _, Vt = np.linalg.svd(centered, full_matrices=False)
self.components_ = Vt[:self.n_components] # (32, 1024)
return self
def transform(self, x: np.ndarray) -> np.ndarray:
# At inference: single matrix multiply — (1024,) → (32,)
return (x - self.mean_) @ self.components_.T
With 90 seconds of training data and tight size constraints, we needed an architecture with strong inductive biases for ECoG — one that wouldn't waste parameters on patterns that can't be learned from this small a dataset. EEGNet (Lawhern et al., 2018) was designed exactly for this regime.
The key idea is factoring the spatiotemporal convolution into two explicit stages: a temporal filter that learns when neural features occur, followed by a depthwise spatial filter that learns which channel combinations matter. Depthwise separability avoids the parameter explosion of a full spatiotemporal convolution. The architecture is parameter-efficient by design.
def forward(self, x: torch.Tensor) -> torch.Tensor:
# x: (batch, 1, n_channels=32, window=1600)
# Block 1: temporal patterns
x = self.conv1(x) # (B, F1=8, 32, 1600)
x = self.bn1(x)
# Block 2: spatial channel combinations (depthwise)
x = self.depthwise(x) # (B, F1*D=16, 1, 1600)
x = self.bn2(x)
x = F.elu(x)
x = self.pool1(x) # (B, 16, 1, 400) — aggressive downsampling
x = self.dropout1(x)
# Block 3: separable convolution
x = self.separable1(x) # depthwise
x = self.separable2(x) # pointwise
x = self.bn3(x)
x = F.elu(x)
x = self.pool2(x) # (B, F2=16, 1, ~50)
x = self.dropout2(x)
x = x.flatten(start_dim=1)
return self.fc(x) # (B, 9)
Class-weighted loss. With a 16.5:1 silence-to-tone ratio, standard cross-entropy collapses. We weight the loss inversely proportional to class frequency, which forces the model to treat a correct prediction on a rare tone class as equally important as predicting silence correctly. This single change had an outsized impact on balanced accuracy.
# Inverse frequency class weights
class_counts = np.bincount([label_to_idx[l] for l in labels])
weights = 1.0 / class_counts
weights = weights / weights.sum() * len(weights)
criterion = nn.CrossEntropyLoss(
weight=torch.tensor(weights, dtype=torch.float32).to(device)
)
Train on the full dataset for final submission. After selecting the best configuration on validation, we retrained on combined train + validation data for an additional 45 epochs. With this small a dataset, every example counts.
Final hyperparameters:
| Parameter | Value | Why |
|---|---|---|
projected_channels | 32 | Optimal size/accuracy tradeoff |
window_size | 1600 | 1.6 seconds of causal context — see below |
F1 (temporal filters) | 8 | EEGNet default |
D (depthwise multiplier) | 2 | EEGNet default |
dropout | 0.25 | Regularization on small dataset |
batch_size | 64 | Stable gradient estimates |
epochs | 30 + 45 | Validation tuning + full retrain |
This was the single most impactful finding from the entire hackathon, and it runs counter to the intuition most people bring to streaming inference.
The common assumption: a larger input window means higher latency. If you're processing one sample every millisecond, a 1600-sample window seems slow. Most teams used windows of 50–128ms. We tried 1600ms (1.6 seconds).
| Window Size | Balanced Accuracy | Accuracy Score | Inference Latency | Delta |
|---|---|---|---|---|
| 128ms (128 samples) | ~67% | ~33.5 / 50 | <1ms | baseline |
| 1600ms (1600 samples) | ~94% | 47.1 / 50 | <1ms | +20% accuracy, same latency |
Same latency. Twenty-seven percentage points more accuracy. How?
The answer is in EEGNet's architecture. The average pooling layers in Blocks 2 and 3 decimate the time dimension aggressively early in the network:
Input window: 1600 samples
After Pool1: 400 samples (÷4)
After Pool2: ~50 samples (÷8)
→ Classifier sees a 50-sample representation regardless of input window length
By the time the input reaches the expensive operations — the separable convolution and the linear classifier — a 1600-sample window has been compressed to ~50 samples. The forward pass cost is dominated by the initial temporal convolution and the final linear layer, neither of which scales strongly with input length.
The PCA projection runs in constant time. The ring buffer update is O(1). The latency bottleneck is memory bandwidth and a handful of fixed-size matrix operations — not window size.
The biological reason this works: auditory cortex responses to sustained tones are not sharp spikes at onset. They are slow-building oscillatory patterns — synchronized low-frequency activity that takes hundreds of milliseconds to develop and reach steady state. A 128ms window catches only the onset transient. A 1600ms window captures the full sustained response, making the nine tone classes far more distinguishable.
Ten teams competed. Our final submission on January 24 scored 91.7 — a 22.5-point gap over second place. The accuracy component alone (47.1/50) accounted for most of the margin; our balanced accuracy of 94.2% was nearly double the second-place team's accuracy score.
Looking at the score breakdown: our accuracy advantage was the dominant factor (+16.2 pts over 2nd place on accuracy alone). Latency and size were more competitive across the field — most teams got inference under 20ms, but nobody matched our accuracy-to-size ratio. The 1600ms window was our differentiator.
The full codebase is public on GitHub. The repo is the fork of the official BrainStorm starter template, with our EEGNet implementation, training scripts, and a benchmark utility that computes all three scoring dimensions against any checkpoint.
git clone https://github.com/qsimeon/brainstorm-track1-public
cd brainstorm-track1-public
make install # sets up uv venv + dependencies + git hooks
uv run python examples/train_eegnet.py \
--window-size 1600 \
--projected-channels 32 \
--batch-size 64 \
--epochs 30
uv run python examples/benchmark_eegnet.py
# outputs: balanced_accuracy, lag_ms, size_mb, and all three sub-scores
model = EEGNet.load() # loads model.pt from repo root
model.reset() # clear history buffer for new session
for sample in stream: # sample shape: (1024,)
prediction = model.predict(sample)
# prediction: scalar Hz class (0, 120, 224, ... 9736)
Don't guess that a larger window adds latency. For convolutional architectures with pooling, the cost scales far less than linearly. We went from 128ms to 1600ms with no measurable latency increase. This asymmetry — history is nearly free but dramatically improves accuracy — holds broadly for streaming neural data.
In micro-ECoG recordings, most structured signal lives in a low-dimensional subspace. PCA compression simultaneously reduces model file size, speeds up all downstream computation, and improves generalization by discarding noise-dominated directions. It isn't just a preprocessing trick — it's a core part of the scoring strategy.
67% silence means an unweighted model learns to predict silence. Track balanced accuracy from epoch 1, not just raw accuracy. Weight your loss. The two metrics can diverge by 60+ percentage points on a dataset like this.
EEGNet's inductive biases — explicit temporal-then-spatial factorization, aggressive pooling — are directly matched to how ECoG data is structured. With 90 seconds of training data, this match matters more than raw model capacity. Transformers, LSTMs, and other general-purpose architectures need substantially more data to overcome the lack of prior structure.
It's easy to over-optimize accuracy and discover you've blown the size budget with an hour to go. Build a benchmark script on day one that reports all three sub-scores for every run. The scoring formulas are public — use them during development, not just at submission time.
Team MindMeld — BrainStorm 2026, Track 1: Neural Decoder Challenge
Quilee Simeon · Dennis Loevlie · Raghav Gali · Rohan Bhatane · Shravankumar Janawade · Sriram G.
Hosted by Precision Neuroscience at Microsoft NERD Center, Boston · January 23–24, 2026 · 10 competing teams