Classification/Characterization Subgroup



Members that collaborated to generate this roadmap: 
Keivan Stassun, Vanderbilt University
Mahmoud Parvizi, Vanderbilt University
Martin Paegert, Vanderbilt University 

Primary subgroup contact:

Keivan Stassun: 


Subgroup MAF engineer:

to be appointed


Subgroup Primary members

  • Andrew Becker
  • Josh Bloom
  • Mark Huber
  • Zeljko Ivezic
  • Darko Jevremovic
  • Ashish Mahabal
  • Adam Miller
  • Gautham Narayan
  • Hakeem Oluseyi
  • Frederic Piron
  • Peter Plavchan
  • Umaa Rebbapragada
  • Stephen Ridgway
  • Abi Saha
  • Rob Seaman
  • Tom Vestrand
  • Przemek Wozniak
  • Rafael Martínez-Galarza


Subgroup Secondary members

  • Lluis Galbany
  • Chris Smith
  • Alexandre Roman
  • Samaya Nissanke
  • Laura Chomiuk
  • Knox Long
  • Andrej Prsa
  • Paula Szkody
  • Lucianne Walkowicz
  • Virginia Trimble
  • Marcio Catelan
  • Arne Henden
  • Edward Schmidt
  • Alistair Walker
  • Peter Brown
  • Ryan Chornock
  • Melissa Graham
  • Cosimo Inserra
  • Tom Matheson
  • Danny Milisavljevic
  • David Reiss
  • Stephen Smartt
  • Niel Brandt
  • Suvi Gezari
  • Josh Pepper
  • Keivan Stassun
  • Rachel Street
  • Chris D'Andrea


Roadmap Outline

  1. Science Drivers
  2. Current Work
  3. Key Questions


Science Drivers

Virtually everything in the sky will be "variable" at the expected photometric precision of LSST. This presents an unprecedented opportunity to develop methods for classification of astronomical objects partly or entirely on the basis of time-series photometric variability.

For example, LSST provides a powerful new capability for monitoring periodic variable stars, such as RR Lyrae stars, which can be used to map the Galactic halo and intergalactic space to distances exceeding 400 kpc. Exploiting this capability for time domain science means rapid data reduction and classification in order to flag interesting objects for spectroscopic and other follow up with separate facilities, as well as ensemble population studies through analysis of the LSST light curve data alone. Thus LSST requires that data processing enable a fast and efficient response to transient sources (i.e., automated identification of variable stars and astrophysically interesting binaries) with a robust and accurate preliminary classification, as well as methods for in-depth classification of large ensembles of LSST sources throughout the mission lifetime.


  1. Collection Cadence and Number of Observations
  2. Purity vs. Completeness
  3. Extinction and Crowding
  4. Disambiguating various classes of periodic variables
  5. Extending classification techniques to quasi- and non-periodic variables
  6. Classification, or at least identification, of unexpected classes of variables/transients


  1. LSST extends time–volume space a thousand times over current surveys such that the most interesting science may well be the discovery of new classes of objects.

Current Work

The EB Factory Project:

  1. "LSST will observe about 2 billion stars yielding 28 million EBs. Due to the less than optimal coverage in time, 28%, or 7.8 million of these, will be detectable. Clearly these numbers indicate that a manual approach to light curve classification for analysis of EBs cannot continue into the LSST era.The EB Factory is an end-to-end computational pipeline that allows automatic processing of massive amounts of light curve data—from period finding to object classification to determination of the stellar physical properties."
    • Paegert et al. 2014, The Astronomical Journal, 148, 31
    • Abstract: We describe a new neural-net-based light curve classifier and provide it with documentation as a ready-to-use tool for the community. While optimized for identification and classification of eclipsing binary stars, the classifier is general purpose, and has been developed for speed in the context of upcoming massive surveys such as the Large Synoptic Survey Telescope. A challenge for classifiers in the context of neural-net training and massive data sets is to minimize the number of parameters required to describe each light curve. We show that a simple and fast geometric representation that encodes the overall light curve shape, together with a chi-square parameter to capture higher-order morphology information results in efficient yet robust light curve classification, especially for eclipsing binaries. Testing the classifier on the ASAS light curve database, we achieve a retrieval rate of 98% and a false-positive rate of 2% for eclipsing binaries. We achieve similarly high retrieval rates for most other periodic variable-star classes, including RR Lyrae, Mira, and delta Scuti. However, the classifier currently has difficulty discriminating between different sub-classes of eclipsing binaries, and suffers a relatively low (60%) retrieval rate for multi-mode delta Cepheid stars. We find that it is imperative to train the classifier's neural network with exemplars that include the full range of light curve quality to which the classifier will be expected to perform; the classifier performs well on noisy light curves only when trained with noisy exemplars. The classifier source code, ancillary programs, a trained neural net, and a guide for use, are provided.
  2. "An automated classification pipeline with parameters adaptable to multiple time series photometric surveys would be immediately applicable to the Kepler data set and would be well suited to the LSST requirement that data processing enable a fast and efficient response to transient sources."
    • Parvizi et al. 2014, The Astronomical Journal, 148, 125
    • Abstract: Large repositories of high precision light curve data, such as the Kepler data set, provide the opportunity to identify astrophysically important eclipsing binary (EB) systems in large quantities. However, the rate of classical ``by eye" human analysis restricts complete and efficient mining of EBs from these data using classical techniques. To prepare for mining EBs from the upcoming K2 mission as well as other current missions, we developed an automated end-to-end computational pipeline — the Eclipsing Binary Factory (EBF) — that automatically identifies EBs and classifies them into morphological types. The EBF has been previously tested on ground-based light curves. To assess the performance of the EBF in the context of space-based data, we apply the EBF to the full set of light curves in the Kepler ``Q3" Data Release. We compare the EBs identified from this automated approach against the human generated Kepler EB Catalog of 2600 EBs. When we require EB classification with 90% confidence, we find that the EBF correctly identifies and classifies eclipsing contact (EC), eclipsing semi-detached (ESD), and eclipsing detached (ED) systems with a false positive rate of only 4%, 4%, and 8%, while complete to 64%, 46%, and 32% respectively. When classification confidence is relaxed, the EBF identifies and classifies ECs, ESDs, and EDs with a slightly higher false positive rate of 6%, 16%, and 8%, while much more complete to 86%, 74%, and 62% respectively. Through our processing of the entire Kepler ``Q3" dataset, we also identify 68 new candidate EBs that may have been missed by the human generated Kepler EB Catalog. We discuss the EBF's potential application to light curve classification for periodic variable stars more generally for current and upcoming surveys like K2 and the Transiting Exoplanet Survey Satellite.

Key Questions

  1. What are the minimal requirements on the cadence and number of available observations for successful classification?
  2. How to extend current light curve classification methods to quasi- and non-periodic light curves?
  3. How do we incorporate the discovery of new classes of variable objects with automated classifiers trained via supervised leaning… the Miscellaneous (MISC) class?
  4. What are the requirements on purity and completeness for the given constraints due to extinction and crowding?