Research Core - Data Science

To support the activities of the Research Projects and Advanced Imaging and Instrumentation Core, our Data Science Resource Core draws upon the expertise of an exceptional interdisciplinary team with expertise in computer infrastructure, analysis and modeling, data science, statistics, machine learning and the teaching and training of data science methods.

Project Abstract

As we seek to unlock how the brain generates behavior, measuring what the brain is doing as we observe the behaviors it generates, and building models that might mimic the process are critical. As tools for imaging and recording brain activity and behavior improve, and the complexity of our models and computations increase, so does the density and diversity of our measured datasets. Handling these much data is an intellectual challenge in itself, while arguably even more data will be needed if we are to understand the relation between brain and behavior. The projects within the U19 span a wide range of different approaches to dissect the motor pathway in the awake, behaving mouse. The methodologies to be used, supported by our Advanced Imaging and Instrumentation core, are almost uniformly prototype systems, not available off the shelf, and therefore will provide both unprecedented data, and present new challenges for standardization, analysis and information extraction. To support these activities, our data science resource core will draw upon the expertise of an exceptional interdisciplinary team with expertise in computer infrastructure, analysis and modeling, data science, statistics, machine learning and the teaching and training of data science methods. With this team, we hope to build a model data pipeline that is scalable, robust and capable of addressing immediate needs and deficiencies, while establishing best practices moving forward. The location of the U19 project within Columbia’s new Zuckerman Mind Brain Behavior Institute will further enhance the impact of this effort, firstly through our ability to establish shard, state of the art infrastructure optimized for our framework and pipeline, and second by enabling us to establish new standards that can be scaled to serve the whole institute and beyond. 

Our core will have 3 main components: 1) Establishing the hardware and network infrastructure that will enable centralized, indexed, secure yet easily shared data storage and high-power analysis from anywhere. This infrastructure will also include direct access to analyze host data on our high-performance cluster resources and through cloud computing.  2) Developing, tracking, modularizing and sharing novel algorithms and modeling approaches centrally, enabling version control as well as indexing, pooling and sharing of processed data. 3) Establishing a core effort focusing on training and day to day support of researchers needing to establish skills and expertise in big data analysis, modeling and statistical methods. This latter effort recognizes that data analysis becomes a bottleneck when users lack programming skills, and analytical experience or simply confidence in their ability to develop their own experimental designs and analyze their own data. We plan to share and disseminate all aspects of our core activities, from best practices and hardware configurations, raw and processed data, algorithms and models and our new pedagogical approaches and successes.