Benchmarking open-weight LLMs for ticker mapping. This article was written by Shunran He, Quantitative Research Analyst Intern at validityBase, and is republished from the validityBase (vBase) blog with permission.
Summary
Open-weight LLMs can now map text to tickers almost as well as the best closed-weight models. And one of the strongest runs on a single workstation.
With model performance now so strong, it’s the production pipeline around the model that becomes the differentiator for successful ticker mapping builds.
Inference cost, data custody requirements, and privacy concerns need not limit the accuracy of text labeling and featurization in alternative data (alt data) pipelines. We find that an ensemble of four open-weight LLMs matches the leading light closed-weight models on ticker mapping quality. Such...


