Statement QUESTION: Statement of how this degree facilitates your career goals
Statement QUESTION: Statement of how this degree facilitates your career goals
- Relate my past and present experience to the Future
Credit Risk Analyst (heavy data use for analytics)
Model validation (review econometrics and financial models)
Manage model risk
- Ability to translate large, unstructured, complex data into information to improve decision making
- Help the financial company I work for be a pioneer in behavioral finance through the use of big data
- Skills which traditional statisticians don’t have (Next generation statisticians)
- Research Ability
Executive Summary of the PHD Program:
The McKinsey Global Institute has identified that the demand for deep analytical talent will outpace the supply in the United States by almost 200,000 people within three years. In response, the White House has launched a “Big Data Research and Development Initiative”, to “expand the workforce needed to develop and use Big Data technologies”. This theme is echoed by Thomas Davenport’s recent article in Harvard Business Review titled “Data Scientist: The Sexiest Job in the 21st Century”.
These studies – and many others – point to the need for universities to educate and train “Data Scientists” to address this demand. However, no university in the country currently has a degree program in Data Science – defined as the intersection of Statistics, Mathematics and Computer Science.
The degree will train individuals to translate large, unstructured, complex data into information to improve decision making. This curriculum will include programming, data mining, statistical modeling, and the mathematical foundations to support these concepts. Importantly, it will also emphasize communication skills – both oral and written – as well as application and tying results to business and research problems.
Because this degree is a Ph.D. (rather than a Doctorate in Data Science), it creates flexibility for the student. Graduates can either pursue a position in the private or public sector as a “practicing” Data Scientist – where the demand is expected to greatly outpace the supply – or pursue a position within academia, where they would be uniquely qualified to teach these skills to the next generation.
The Ph.D. in Analytics and Data Science will not only help to close the talent gap in the area of Data Science
This Ph.D. will utilize a multidisciplinary approach, with emphasis on Statistics, Mathematics, Computer Science and a “content” discipline such as Biology, Chemistry, Finance, Physics, Political Science, etc.
- Needs the program will meet
“I skate to where the puck is going to be, not where it has been.” – Wayne Gretzky
The United States Federal Government recently issued a press release addressing what it sees as a growing critical shortage of data analysts and, on March 29, 2012, issued the “Big Data Research and Development Initiative”. One of the main purposes of the initiative is to “expand the workforce needed to develop and use Big Data technologies”.
The term “Big Data” is increasingly included within descriptions of required skill sets across a wide variety of disciplines and sectors of the economy. While the accepted definition of Big Data is continuing to evolve, there is no question about the expansion and prevalence of related concepts and their expanded role in the future.
According to The Economist magazine, unmanned American military aircraft (i.e., drone aircraft) flying over Iraq and Afghanistan in a single year (2009) produced approximately 24 years’ worth of video surveillance footage. Every year, Google acquires an equivalent amount of data to the entire Library of Congress.
These astonishing facts highlight at least four major points about how data is collected, analyzed, and used:
- Extraordinary, previously unimaginable amounts of data are being collected and stored for subsequent analysis, which contain potentially significant and meaningful information in the private and public sectors and to society at large.
- It is not feasible to manually review and/or analyze such massive data in a timely manner using traditional methods. Computer-assisted semi- or fully-automated processes using new computational and data mining methods are needed in order to extract useful information from massive data sources in a timely manner.
- In addition to massive amounts of traditional structured data (i.e., tabular data), extraordinary amounts of unstructured, non-traditional data such as video footage, audio recordings, and unstructured text are being collected and stored. Increasingly, these two very different types of data must be merged together in systematic ways in order to obtain useful information.
- Unlike the past, data collection and analysis is no longer a purely academic endeavor. Data gathering and analysis for obtaining useful information most often used in decision making processes is used in almost every field and sector imaginable at present including the sciences, public health, the healthcare industry, all aspects of business and finance (including retail, insurance, marketing, the service industry, the credit industry, fraud detection, the communications industry, etc.), psychology, education, public policy agencies, government elections, and critically, in national security and defense.
From these four points it follows that:
The next generation of statisticians will face very different challenges and issues than previous generations of statisticians. As a result, the next generation of statisticians requires a new set of knowledge and skills in order to effectively serve the data analysis needs of the 21st century. These skills will incorporate more emphasis on applied mathematics and on computer programming than has historically been the case – even for applied statisticians.
- Brief explanation of how the program is to be delivered
Opportunities are usually disguised as hard work, so most people don’t recognize them. – Ann Landers
While additional resources will be required to make the Ph.D. in Analytics and Data Science successful (see Section 4 below), much of the basic delivery infrastructure is in place.
The general structure of the program will include three stages:
Stage 1: Course Work
If you only have a hammer, you tend to see every problem as a nail. – Abraham Harold Maslow
The Ph.D. in Analytics and Data Science will begin with 48 hours of core course work/instruction, spread over (expected) four years of study, plus six hours of electives and 24 (minimum) hours of dissertation and internship (78 total hours). In response to the market needs and skill gaps as outlined above, the Ph.D. in Analytics and Data Science will have a strong interdisciplinary and application orientation. Generally, coursework will be structured as provided in Figure 5.
A full listing of the proposed courses, and a sample program of study can be found in SECTION 5 below.
The logic supporting this interdisciplinary approach is that the curriculum would be aligned with the needs of the marketplace – as evidenced in Section 1b above.
Students will be required to complete a comprehensive examination of their course materials before they are considered to have completed this stage. The comprehensive examination will cover materials from all of the three areas of study listed above.
Stage 2: Application
The Ph.D. in Analytics and Data Science is, at its core, an applied program. Ph.D. students would be required to engage in one year – for a total of 15 credit hours – of application. This application will take one of two forms.
The first form of application is private or public sector work experience. The Statistical Advisory Board has agreed, in principle, to “hire” Ph.D. students on a contract basis for a minimum of one year, after they have completed their coursework – but prior to completing their dissertation.
Stage 3: Dissertation Research
A Ph.D. in Analytics and Data Science would require a formal Dissertation process, involving an interdisciplinary committee, comprised of faculty from Statistics, Computer Science, and Mathematics. Depending upon the application path pursued above, a faculty member from a “content” discipline (e.g., Marketing, Finance, Chemistry, Biology, Economics) would be included and possibly an external committee member as appropriate.
Who is a DATA Scientist?