You don’t know how to hire a REAL Data Scientist!


I was recently contacted by a recruiter concerning a Director of Analytics position at a Private University in my home state; they wanted to know if I could recommend someone for the position since I had a background in Education and Data Science. I dug through my contacts and supplied them with 4 names of experts that would be more than qualified for what they were looking for, all of them Data Scientist with at least 15 years of experience. Well, finally I heard from the last of those I had recommended, and each, for various reasons, were disqualified for the position, HUH?? I called the recruiter and asked what was going on, she said…. “her client, was looking for someone with more experience in fundraising” so, let me get this right….. you disqualified some of the greatest minds in Data Science because of lack of fundraising experience, really.

This is not the first time I have seen “truly amazing” people overlooked for a position due to “them” NOT having some little missing detail on their resume. It makes me question the people doing the hiring; do they even know what they want? Can they recognize an experience candidate or are they going on gut feelings and preconceptions of what “they” think the position needs.

Would you recognize a true Data Scientist if you met one? If you wanted to add data science or analytics to your University or Corporation where would you go, to a head hunter or a recruiter? Probably, but what makes you think they are qualified to find you the best Data Scientist out there when most of them are still trying to figure out what Data Science is!

If you really want to hire the best, I recommend you research the position first, how can you find the perfect candidate if you don’t? Buzzword like “Data Science” and “Big Data,” are added to everyone’s resume in analytics, this DOES NOT make them qualified, stop searching for the obvious and look for words that REAL Data Scientist would use – probability, models, machine learning, statistics, data engineering, pattern recognition, learning, visualization, data warehousing, are some examples.

In conclusion of my rant, I’d like to make one point…… If you really want to hire an expert in Data Science don’t go for the one with the biggest blog or the one that writes the most books, honestly a great data person doesn’t have the time or desire to write blogs and books, we’d rather be doing what we love; playing with data. If I had my choice on hiring the best, I would check out LinkedIn, find all the candidates I wanted and then call and verify reference before I even set up the first interview. Too many people out there pad and just flat out lie about their skills so verify everything! Sometimes I wonder if anyone follows up on anything anymore! A 10 minute phone call will stop you from figuring out how to get rid of someone that sucks, believe me, I see it happen frequently. There are some truly gifted individuals out there, don’t overlook them because you are mesmerized by your own agenda.


Data Science ROCKS!

Big Data needs Data Science but Data Science doesn’t need Big Data

What can your data do for you?

What can your data do for you?



Recently I did a webinar with Kalido and enjoyed it tremendously, they were kind enough to give me a summary of the webinar, thanks Kalido

My Favorite Quote from the Webinar “Big Data needs Data Science but Data Science doesn’t need Big Data” Carla Gentry aka @data_nerd

Data science has been around for decades, and it’s not just big data. I hear a lot of people clumping these two together like they go hand-in-hand, which I agree with to an extent. However, big data needs data science but data science doesn’t necessarily need big data. Most of the data a typical company handles on a daily basis or house internally is not big data. Even Facebook and Google break up or segment their data into workable pieces. Data science is big, small, structured, unstructured, messy, clean, etc… It’s more than just analytics. As a data scientist, you’ll become a liaison between the IT department and the C suite. You have to talk both languages and you have to understand the hierarchy of data, you can’t be just an architect or data expert.


What really matters in data science is the team effort and your role as a liaison. Your company has large amounts of data and you want to make sure your queries are correct. Whatever tool you use, make sure you have your data cleansed. You want to know that it’s normalized and indexed so that things run smoother. You want to be able to give insight, which requires knowledge of your audience. If your audience is the C suite of a multi-million dollar company, you’re going to need everything you have to back up your conclusions. Be able to prove it and be prepared for questions.


What sort of personality makes for an effective data scientist?

Definitely curiosity, I remember in college, my professors shut the door if they saw me coming because telling me that a2 + b2 = C2 was never enough. I wanted to know why. So the biggest question in data science is “why?” Why is this happening? If you notice that there’s a pattern, ask “why?” Is there something wrong with the data or is this an actual pattern going on? Can we conclude anything from this pattern? A natural curiosity will definitely give you a good foundation.


For aspiring data scientists, where can they begin?

There are many positions you can get into to learn data science; it’s not just for data engineers. Personally, I started as a junior analyst. Everyone has to start at the ground floor but there are so many resources and open-source data places you can go to practice. Most IT departments aren’t going to give you access to their live database, but they may give you access to their development database where you can go in and practice. Any position that you get into, go tell your boss that you’re interested in becoming a data scientist. Sign up for courses, learn programming languages and learn business. You have to know about budgets and various business aspects, not just the analysis part and not just the IT part. Data science is a wonderful field, and I encourage anyone that has a curiosity about data analysis, hypothesizing, statistics, to give it a shot. Just know that it won’t happen overnight.

Data Science

Data Scientist, Analytical-Solution


Over the years I have done quite nicely for myself as the Founder of Analytical Solution, as everyone says, wish I had done it sooner. But every once in a while I reach out to other businesses in order to do something different or I see potential I could add by joining forces with them, unfortunately each time I have tried, I have been dismiss or ignored, curious since Data Scientist are supposed to be in such demand? In all honesty, if I never added another client to my base, I would be fine financially but what fun would that be? After 2 years of “going it alone” I miss comradery and giggling because someones chair made a strange noise, giggle, you know what I mean. I am a NERD and we are not solitary creatures.


My most recent dissing, giggle, was from a company in Louisville, not far from me, I could have popped over, done some analysis, crunched some numbers, built a model, taught them about databases, ETL, Modeling… the possibilities are endless because I’ve been doing this for so many years but the guy just blew me off, like it was everyday that a Data Scientist / Economist / Mathematician with over 15 years of experience contacts him. No biggie, but what if he had joined forces with me?


Recently Big Data Republic started a Big Data 100 who to follow list, all the Data people are smiling or joining PeerIndex to raise their scores, giggle, it has actually been fun finding new data people on Twitter to follow (which was their original intent, not to exclude or make anyone upset) but I digress – the point is, I’m on this list and every time someone clicks on this site, there is my name, they can drill down by clicking my name or icon to receive more information. Imagine all the thousands of chances missed by NOT partnering with me, the info under my name could have said Founder of Analytical Solution and Partner XXXXX blah blah, you get the point. So, before you pass up an opportunity, the next time someone emails, tweets, or calls, give them a chance, you never know what could come of it 🙂

It’s not everyday someone with my experience wants to share the wealth of knowledge accumulated working with Fortune 100 and 500 companies, Colleges, Financial Institutions and Econometrical Consulting Firms. If you are in the Louisville area and would be interesting in speaking – send me an email to (who knows, it could be the opportunity of a life time)

Being a Data Scientist

 Data Scientists

Being a “Data Scientist” Is As Much About IT As It Is Analysis by Carla Gentry, aka @Data_nerd

IBM defines the data scientist as -> A data scientist represents an evolution from the business or data analyst role.


The formal training is similar, with a solid foundation typically in computer science and applications, modeling, statistics, analytics and math. What sets the data scientist apart is strong business acumen, coupled with the ability to communicate findings to both business and IT leaders in a way that can influence how an organization approaches a business challenge.


Good data scientists will not just address business problems, they will pick the right problems that have the most value to the organization. The data scientist role has been described as “part analyst, part artist.”


Anjul Bhambhri, vice president of big data products at IBM, says, “A data scientist is somebody who is inquisitive, who can stare at data and spot trends. It’s almost like a Renaissance individual who really wants to learn and bring change to an organization.”…


A data scientist does not simply collect and report on data, but also looks at it from many angles, determines what it means, then recommends ways to apply the data.


Data scientists are inquisitive: exploring, asking questions, doing “what if” analysis, questioning existing assumptions and processes. Armed with data and analytical results, a top-tier data scientist will then communicate informed conclusions and recommendations across an organization’s leadership structure.


IBM hits the nail on the head with the above definition. Having worked with traditional data analysts as well as programmers, developers, architects, scrum masters, and data scientists — I can tell you they don’t all think alike. A data scientist could be a statistician but a statistician may not be completely ready to take on the role of data scientist, and the same goes for all the above titles as well.


Beth Schultz from All Analytics mentioned that we are like jacks of all trades but masters of none; I don’t completely agree with this comment, but do agree that my ETL skills are not as honed as my analysis skills, for example. My definition of the data scientist includes: knowledge of large databases and clones, slave, master, nodes, schemas, agile, scrum, data cleansing, ETL, SQL and other programming languages, presentation skills, Business Intelligence and Business Optimization — plus the ability to glean actionable insight from data. I could go on and on about what the data scientists needs to be familiar with, but the analysis part has to be mastered knowledge and not just general knowledge. If you want to separate the pretenders from the experienced in this business, ask a few questions about how data science actually works!


When I start working with a new data set (it doesn’t matter how much or what kind), the first question I usually ask is, what kind of servers do you own?

Why would you need to know about the servers to work with data? I ask this question so I will know what kind of load it can handle – is it going to take me 9 hours to process or 15 minutes? How many servers do you have? I ask this because if I have 4 or 5 servers, I can toggle or load balance versus having only 1 that I have to babysit.

What kind of environment will I be working in? I ask this because I need to know if they have a test environment versus a live environment, so I can play without crashing every server in the house and ticking a lot of people off. If you are working with lots of data, lower peak times or low load times are better for live, as compared to test or staging environments where you can “play” without fear. This way, you won’t “bring down the house”.

It’s a good idea for you Chief Marketing Officers (CMOs) to let your Data Scientist work in the evening hours and/or on weekends, at their homes if applicable. This, of course, requires setting up a VPN connection and it also depends on how secure the data connections are, as well as how much processing I can do before I crash them, – um, I mean, what is the speed and capacity to process? If a dial-up connection is all that’s available, forget it.

As a side note, I’ve crashed many a server in my day – how do you think I learned all this stuff? Back in the Nineties, someone would crash the mainframe and we would all head to Einstein’s Deli in Oak Park, IL but today, this might be frowned upon. But I digress, back to more IT related things.

Another handy thing to find out is how the databases are joined. By that I mean, what variables do they have in common (i.e., “primary keys”)? Are the relationships one-to-one, one-to-many, or many-to-many? Why would you ask this? Some programmers (I don’t mean this in general) don’t completely understand relational databases, especially when it comes to transactional data and data that needs to be refreshed often. You have to set up a database like you would play chess: think at least three moves ahead.

Sometimes, it’s better to start from scratch and build your own data source. When writing scripts to extract or refresh data, don’t forget a few keys things: normalize, index, pick your design based on what you know about the data and what is being requested of it.

Servers are important, and if dealing with large databases, load balance or toggle whenever possible. Also, star schema versus snowflake schema is important, so please put some serious thought into this. Ask yourself, do I need it fast or efficient? Believe me, I always pick efficient (I am a nerd, after all) but if the client needs it ASAP, then the client shall have it ASAP.

With knowledge of the client’s IT setup from a data management/quality perspective, you’ll be equipped to handle most situations you run into when dealing with data, even if the Architect and Programmer are out sick. Your professional knowledge is going to be a big help in getting the assignment or job complete.

Happy data mining and please play with data responsibly!

About the Author

During the past 18+ years, Carla Gentry has worked with Fortune 100 and 500 companies including but not limited to, Discover Financial Services, J&J, Hershey, Kraft, Kellogg’s, SCJ, McNeil, Firestone, Disney, PBA, Computer Systems institute, RIVO, Deloitte and Insight Global. Acting as a liaison between the IT department and the Executive staff, she is able to take huge complicated databases, decipher business needs and come back with intelligence that quantifies spending, profit and trends. Being called a data nerd is a badge of courage for this curious Mathematician/Economist because knowledge is power and companies are now acknowledging its importance.

To find out more about what a real @data_nerd does, please visit my profile on LinkedIn ->

Seems anyone can create a list of people to follow – unfortunately when I see list of Top Data people or Data Scientist, it’s always male dominated, not saying they don’t deserve it because they do, but there are many ladies out there that code their butts off and glean insight like a ROCK STAR. So, to correct this, I have compiled a list of great women in Data to follow.


Please follow these users for the latest in Data Science, Analysis, and Quality!


Karen Lopez



Shelly Lucas



Big Data Gal



Dr Emily R Coleman



Jacqueline Roberts



Gwen Thomas



Melinda Thielbar



April Reeve



Sarah Schmidt



Angela Dunn



Cathy O’Neil






Isabel Elaine Allen



Beth Schultz



Emily Carter



Noreen Seebacher



Blaine Kohl



Loretta Mahon Smith



Mandi Bishop



and last but not least – ME 🙂 15 years of crunching data and gleaning insight

Carla Gentry CSPO



I hope you enjoy following these users as much as I do and if I forgot any Data Ladies, please accept my apologies – and send me a Tweet @data_nerd so I can start following you 🙂

Data Science is NOT for everyone



Everywhere you look – a new program or University is teaching data science, while I think it is great for the field…… heads up people, data science is not new and creating a bunch of book smart data people without any business experience is NOT the answer. I started college in ‘93 with a group of math friends, 6 years later I graduated with a BS in Mathematics and Economics but out of 20 who started the mathematics courses at the same time I did, only 2 including me had graduated! Not trying to show off, trying to explain that logic and science are not for everyone. It’s not something that you wake up and decide, I’m going to be a Data Scientist today. Curiosity won’t help unless you have the background or talent for gleaning insight out of terabytes of data. It isn’t as easy as you hear on Twitter of Facebook; you can’t take a course and be ready to handle dirty, unstructured, messy, server breaking data.
So, what can you do? Not everyone is as lucky as I to start right out of college working for RJKA an Econometrical Consulting Firm with DFS as a client BUT there are ways to learn. and are just a couple of example, take courses, learn about MDM and Data Mining as well as Data Mining Tools and please, learn statistics and learn to write your own code (SQL will always be around) then learn about load balancing quick before you crash a multi-million dollar server and get your butt canned, giggle – As you see, for those who want to invest their time and talent there are resources out there AND the BIGGEST ONE IS (ta da) Kaggle – a great place to play data scientist but please keep in mind you HAVE TO KNOW how to talk to and present findings to the C suite and or board of directors, Sheldon is a genus but self admits that he CHOKES at the thought of presenting to humans. Last but not least, find a job that works with data and learn as much as you can, I started as a junior analyst and now I own Analytical Solution! Good luck in your adventure but don’t fool yourself, it isn’t easy and if it was EVERYONE would be doing it 🙂

Push verses Pull – are you marketing or spamming

There have been a lot of articles over the years; going into great depth about “Push verses Pull Marketing” therefore I am not going to over kill the subject by regurgitating what has already been written. What I will add is that marketing has changed with the introduction of social media into our daily regimen, analysis and reporting. We try various things when we, break into the world of social media, i.e. “push verses pull” marketing, do we blast our message to the world or do we provide great content and let them come to us? In my time on social media, I have seen links stolen and articles cloned, “how many likes for this” (what?) Facebook post, pins redirected to unintended sites and auto-posting gone mad with old articles from 2010 passed around like “hot news” because no one even bothered to read the article they just wanted to re-tweet someone with high Klout. So, push marketing has definitely taken on a new aspect, it is not just about spitting out your product news to anyone out there, it about misdirection and un-professional behavior. Followers and fans figure this out pretty fast so any sales they make are a single purchase and not about building a loyal clientele. I guess if that is your purpose, you have succeeded but businesses are built on longevity not tricks. I have seen great social media campaigns which made me think, giggle, scratch my head and lean into with anticipation of their next tweet so I know there are many companies that are doing it right. Hats off to the social media teams out there that create great content that goes viral and circles the world, you inspire us to turn it up a notch! Companies like Kellogg’s want to be part of your family, they don’t blast their message to unexpected recipients or beg you to “like” their page, they drip, drip, drip, with their messages, until one days, boom, you need cereal and without even thinking, you are buying corn pops.
It takes time to build a following, good marketers know this and slowly build a loyal following through likes, re-tweets, pins, discussion on LinkedIn groups and providing thought provoking content and ads. Occasionally ads go viral but the norm is to engage and be the 1st product or service that pops into someone minds. Take me for example, I am not unique by any means, they are lots of great people out there to follow. I started my Twitter account in 2010 but didn’t bring in the business aspect until March 2011, since then I have added over 5,000 followers, not because I have a high Klout score, beg for followers or talk about my business every 5 minutes. I spend hours a day searching for relevant content to share with my fans and friend on social media, to educate, inform, entertain and spread the word about analysis, data mining, mathematics as well as good habits in analytics and social media marketing. Not because I want to just give away good content but to form a relationship, to imprint my name, whether it be @data_nerd, Carla Gentry or Analytical Solution, into your mind so when you need research, sentiment or text analysis, social media campaign and analytical marketing you’ll head over to my site and give me a call.
In conclusion, when you are creating your social media marketing plan, keep this in mind, do you want to be a household name or a spammer who “gets and loses” followers on a daily basis, the choice is yours. Blast your message to anyone who will listen or target your message, create relevant content, respect your followers, and follow up with leads in a dignified manner. Engage and show potential clients why they should do business with you, if you have expert knowledge, share with others and promote your field. No matter what strategy you use (push verse pull), showing you and your businesses character might not make you a millionaire but I guarantee it will lengthen your business career and lifetime earning potential. Thanks for taking the time to read this article and have a great day of marketing!

Big Data – reduced to a buzz word



A “buzz word”, that is what data has been reduced too. “Big Data” is now a common phrase used to describe numerous counts of different types of data, social media data, point of sale data, financial data, digital and visual data…. Arg, make it stop. But what is it “really” and what makes it useful versus noise?

Over the course of my career, I have worked for companies of all sizes, with some handling data better than others; the best was actually one of the smallest (go figure). Most companies struggled to figure out what to do with the data they have versus how to get more. Retail and CPG companies that can afford all the latest and greatest BI and data mining tools usually collect and use their data very well since it’s REALLY their bread and butter and without it the competition would eat them alive. Unfortunately they aren’t usually able to “house” the data, making “real time” almost impossible. Smaller companies that have jumped into the data pool (per sae) purchase large amounts of data or gather their own live data but rarely have the insight to know what “is” or “isn’t” important. Example, I sell a 250 dollar yard trimmer, now 250 bucks is a bit steep so I know the average person is not going to buy. So, I would need someone who owned (the norm) or rented and really cared (the outliner), and someone who made above average income (the norm) or someone who saved to make the purchase (since it’s a yard trimmer, we’ll say that this is an outliner) but I only have name and email address, what can I do? Honestly, not a whole lot really, except maybe a mailing list. Say I have name, complete address and email, a little better… you could use the addresses to overlay with federal, state and local data or census data from that neighbor. That would tell you median income, average home price, etc. but without more demographic and financial data, it would still not be sufficient to deduce too much insight. So the kind of data you collect becomes more important than ever, if you want to target your customers think about what it would really take for you to get the best insight.

Next issue, when working with data, one needs to think about its quality, what do I mean by that? Is it accurate and clean data? Take a look at the number of duplicate rows of information and incomplete or N/A data fields, these are very important to note and take action on. Next, how your data is labeled and defined, the “metadata” or data dictionary of your database, it tells you if the data field is a character or numeric, the length (max 255 so watch out for those “NOTE” sections), and if applicable a short description of what the variable actually is. A unique quantifier is preferred, when working with FICA/FICO, we used SSN# but in other cases usually a client ID or purchase id, which may not be unique is used. If multiple purchases or visits, with a non-unique way of labeling, occurs this can be a headache especially when working with live data and adding into the master database. Updates in a data warehouse involve data dumps or extraction, transformation and load to merge new data in with existing data (segmentation is based on some type of quantifier, a hopefully unique variable), sounds easy (not) but it gets worse, the bigger the data the longer this process takes and we haven’t even started talking about unstructured data yet, whew. How are incomplete rows beneficial, if you are looking at web data or basket sales, it can show you were someone abandoned their shopping carts, if it’s a loan application, it can tell you where they stopped, see where I’m headed? Data entry is VERY important, a few fat fingered data sets add up fast when you are talking terabytes of data, especially when they are keys in but a multitude of people.

There is more than meets the eye to data, everyone wants it but if you want it just for the sake of having data, make sure it’s not just noise, what do I mean by noise. Data experts usually take different stances on this one; I’m the, make a mental note but remove for the sake of immediate insight, (null data does not make a pretty spreadsheet) kind of person. I take special note at the end of the evaluation or data analysis but don’t freak out trying to figure out why I have 87 records that indicate the person was over 90 years old or they made 123 dollars a year, mis-entries, errors, fat fingers… no time for them now but will contact IT to correct records later (this part is very important as well, if not corrected that is 87 wasted records and they keep coming up with each analysis).
Unstructured data, what do I mean by unstructured, all data has some type of structure… yes, but take Twitter and Facebook data, it doesn’t fit into a tabular form or model but if you manipulated it (using whatever method you choose) you can still infer insight but it’s messy and sometimes a lot of useless information i.e. Joe ate a sandwich and boy was it good, giggle. Lots to think about, tools for collection of data, tools for extraction and updating data, tools for converting unstructured data into usable information, talent to glean insight out of data. Storage used to be a big deal, but now a terabyte is 50 dollars but a data warehouse or data mart will require multiple servers or a mainframe, now there’s some money. But this is enough for you to think about for now, do you still want to build that database or start a data warehouse, if so please don’t shrug it off as begin a piece of cake, to gain insight the corrects steps are to think first, collect second. Happy Mining!


Social Media ROI

60 years ago some thought TV commercials were a waste of time and money. Is history repeating itself with social media?

I know comparing commercials to social media is not apples to apples but its close enough for my purposes. Flash forward to 2011: with over 300 million Twitter users, it offers an even bigger audience than TV did back in the day.

So why isn’t everyone advertising on social media? The reasons range from lack of staff or time to lack of analysis or return on investment (ROI).

1. Let’s tackle a commercial’s ROI
As humans, we are generally skeptical of new things. No one believed that advertising on television would be as successful as it is now, nor did the Internet grow by leaps and bounds when it first started. Advertising on social media (e.g., Twitter, Facebook, etc.) is relatively new, but it is no more an instantaneous lift than a burger commercial: no one runs out to buy the burger right then.

Like planting a seed, one must wait for social media advertising to grow and mature. Unless your product, service or advertising sucks, you will realize traffic, and in turn sales. Do not become a slave to the peaks and trends of your Twitter analytics. Instead, use your energy to establish yourself.

2. Have a strategy
Gather a strong group of followers by posting for a few weeks before you start selling yourself or your product, and repost subject matter that interests you. You now have a foothold to build on, and should learn from mom-and-pop stores:

a) Treat each customer with respect and you can count on their return business.
b) Show you have nothing to hide and your honesty by settling complaints immediately and publicly.

Track your social media efforts with tools like Google Analytics Social or Twitalyzer. Paid versions are available, but if you are a small business why spend money when it’s not necessary?

Tools like Crowdbooster will tell you the best times to post or tweet to increase engagement. What you post is up to you, but ensure it is not garbage that leaves readers wanting more (you know what I mean). If you present yourself in a professional manner, respect others, engage and share great content, you won’t have any problems attracting followers and making sales.

If you follow thousands, have thousands of followers without sales, you made some errors in judgment on who to follow; you need to find your target audience.

3. Did you do your research?
No way, research!?
Yes, way. Topsy, Kurrently and even Twitter have sentiment search capabilities: see who comes up in a search for your service and/or product, and follow or engage them.
Snow shovels don’t sell in Florida, so don’t follow ‘robots’ and expect to run up your sales. Be logical and market smart – if you have a niche, follow it. Analytics, search sentiment, good curation of relevant stories to share and timing may be the difference between open for, or out of business.

Bottom line
Consider what was going through Joseph Bulova’s mind when he signed a contract for the first-ever TV commercial. Did people think he was wasting his time? Bulova went on to become a household name and the man himself was considered a great business mind…

Be a pioneer – do not let lack of ROI stop you when you see potential value, but always do your homework. The great ones always had a well thought out business plan even when the value was unknown.

Think of those who, at the time, reached millions through little-known channels like billboards, magazines and direct mail. Look at Jay Baer and Peter Cashmore, who make great livings from advertising on relative newcomer social media. I am sure they both would say they treated social media as a business capable of great profits because they saw value.

Thank you for your interest, please keep in mind the above article is comparing analysis of social media now to television commercials 60 years ago, I AM NOT comparing it to our ability to run analysis on commercials today.


How do you become a data scientist?

Modern Day Data Scientist

Modern Day Data Scientist

I have read several articles on the subject, but none of the authors were really “Data Scientist” and they admit that, so I thought it was time that something was written by an actual Data Scientist.

First off, let’s make sure you understand that there’s lots of college involved, no way around that one. If you noticed a lady in the 2nd row, 3rd from the left had a mole on her nose in the last commercial you watched, you might have what it takes, even if you hadn’t thought of Mathematics, Engineering or Econometrics as a field of study. What I am implying is that it’s take someone who is VERY observant to be successful in Data Science. Why, because you deal with such large data sets and large outputs/results, your ability to absorb lots of information quickly and exactly, is your best friend. I can scroll a million records in minutes or run a small SQL script, analyze the results and tell you if that data is bad or corrupt in minutes. Cleansing data is always the 1st step, if this part is left out, I can guarantee you will have lots of N/A’s or characters where number should be, etc… so make QA your friend not your enemy.

What major or course work produces the best Data Scientist? Econometrics and Mathematics as long as they have an additional major in Business, why, because of the logic involved as well as the classic theory of Left Brain people and numbers. Creative is great for making power point presentations but when you have 10 terabytes of raw data, pretty is not the 1st things on your mind. Minor or actively engage in courses that will teach you programming, you don’t need hard core Pearl but you will need SQL skills at the very least. Microsoft Visual Studio, SSAS, SSIS, SSRS package, SAS, SPSS, SQL, Cognos, Macros, Visual Basic are all not only good to know but vital when you have multiple client who use different CRM, BI and ETL tools.

Once the schooling ends, the real world begins. My 1st boss said, “forget everything you learned in College, there is no “bell curve” here; meaning, statistics, programming, mathematics, logics and common sense are only the start. Practice on cleansing data, extracting data, normalizing data, segmenting data, loading data, trending data, modeling…. in other words data data data data data. Never assume your results, never ignore anomalies, do keep a unbiased mind and never scrimp on tools, software or classes. Yes, that’s right I still attend webinars and read like crazy to stay sharp on my tools and technic.

We need more people desperately in Science, Technology, Engineering and Mathematics (STEM) so please consider Data Science as a career. According to the latest study we’re in high demand and considered rock stars according to some.
What can your data do you for? @Data_Nerd :o)