Great Lists to Follow

There are some great lists out there to follow for Data Science and Technology – Check out a few I have been honored to be listed on – Please feel free to check out everyone on these lists, some really great folks out there!

 

PRESENTING: The 100 Most Influential Tech Women On Twitter ->

http://www.businessinsider.com/most-influential-tech-women-on-twitter-2014-5

 

10 Big Data Pros to Follow on Twitter  ->

http://www.informationweek.com/big-data/big-data-analytics/10-big-data-pros-to-follow-on-twitter

 

4 Women Leading the Way in Business Intelligence ->

http://plotting-success.softwareadvice.com/4-women-leading-bi-0114/


Top 200 Thought Leaders in Big Data & Analytics

http://analyticsweek.com/top-200-thought-leaders-in-bigdata-analytics/

 

The Women Behind The Data
http://blog.sqreamtech.com/2014/02/the-women-behind-the-data/


Top 5 Data Science Gals

http://datascience101.wordpress.com/2012/12/27/top-5-data-science-gals/


Big Data: Experts to Follow on Twitter

http://www.techopedia.com/2/28887/trends/big-data/big-data-who-to-follow-on-twitter

 

 

I appreciate all the mentions and inclusions to these lists and WOW, I’m in great company!

 

Lastly, this is not a list but some advice ->  “An understanding of math is important,” he says, “but equally important is understanding the research. Understanding why you are using a particular type of math is more important than understanding the math itself.” http://www.wired.com/2013/04/phd-data-scientist/

 

Have fun with DATA :)

 

 

I was recently interviewed by Software Advice for an article about women in STEM fields and in this case specifically about business intelligence. As they stated in the article it is difficult to break into a predominately male field but don’t let that discourage you! There are many women who have done just that and very successfully I might add. So, thanks to Alan S. Horowitz and his crew for the honor of being included in their blog  4 Women Leading the Way in Business Intelligence Excerpt -> “Though clearly hard to break into, technology can be a highly attractive field for women, as it provides a much greater opportunity to stand out than other industries” From Irene Lewis at Software Advice “Your insight into the industry and your expertise in navigating it is inspiring to men and women alike. And thank you for tweeting it out!” Thanks Software Advice and Best Wishes in all you do! Since not every detail was specific, I wanted to add a bit about myself for those that would like to know more, after reading the above article of course ;o) Being a single mother of two sons was a challenge but no one has ever accused me of backing down from a challenge, eager to learn and grow, I entered the University of Tennessee in the spring of 1993. I worked in the Developmental Math Lab my entire tenure with the University of Tennessee, assisting students with all levels of mathematics. Upon graduating with a double major, Applied Mathematics and Economics in 1998, I moved to the Chicago area to start my career in analytics. During the past 16 years, I have worked with many Fortune 100 and 500 companies including but not limited to, Discover Financial Services, J&J, Hershey, Kraft, Kellogg’s, SCJ, McNeil and Firestone, Tandus Worldwide, Terenine and even thought they are not Fortune  companies, both the University of Chicago and the University of Tennessee. Acting as a liaison between the IT department and the Executive staff, I am able to take huge complicated databases, decipher business needs and come back with intelligence that quantifies spending, profit and trends. Being called a data nerd is a badge of courage for this curious Mathematician/Economist because knowledge is power and companies are now acknowledging its importance. Data, what can it do for you today? Specialties: * Comprehensive Customer Satisfaction and Retention Analysis, * Brand Research & Competitive Analysis, * Employee Retention Research, * Survey Creation & Analysis (New Product and Branding), * Database creation and mining, * Social Media & Coupon, Incentive Promotions, * Project Management (Scrum Certified)   * Social Media Marketing * Statistical concepts to solve business challenges. * Advanced knowledge of data warehousing. * Target Audience Analysis * Predictive modeling, forecasting, and data mining. * Develop data strategy, analysis, objectives and business requirements   So, as you see, a woman can succeed in a “MAN’S” world, good luck ladies! If you’d like to see more check me out on LinkedIn at

Einstein2

I was recently contacted by a recruiter concerning a Director of Analytics position at a Private University in Louisville Kentucky; they wanted to know if I could recommend someone for the position since I had a background in Education and Data Science. I dug through my contacts and supplied them with 4 names of experts that would be more than qualified for what they were looking for, all of them Data Scientist with at least 15 years of experience. Well, finally I heard from the last of those I had recommended, and each, for various reasons, were disqualified for the position, HUH?? I called the recruiter and asked what was going on, she said…. “her client, was looking for someone with more experience in fundraising” so, let me get this right….. you disqualified some of the greatest minds in Data Science because of lack of fundraising experience, really.

This is not the first time I have seen “truly amazing” people overlooked for a position due to “them” NOT having some little missing detail on their resume. It makes me question the people doing the hiring; do they even know what they want? Can they recognize an experience candidate or are they going on gut feelings and preconceptions of what “they” think the position needs.

Would you recognize a true Data Scientist if you met one? If you wanted to add data science or analytics to your University or Corporation where would you go, to a head hunter or a recruiter? Probably, but what makes you think they are qualified to find you the best Data Scientist out there when most of them are still trying to figure out what Data Science is!

If you really want to hire the best, I recommend you research the position first, how can you find the perfect candidate if you don’t? Buzzword like “Data Science” and “Big Data,” are added to everyone’s resume in analytics, this DOES NOT make them qualified, stop searching for the obvious and look for words that REAL Data Scientist would use – probability, models, machine learning, statistics, data engineering, pattern recognition, learning, visualization, data warehousing, are some examples.

In conclusion of my rant, I’d like to make one point…… If you really want to hire an expert in Data Science don’t go for the one with the biggest blog or the one that writes the most books, honestly a great data person doesn’t have the time or desire to write blogs and books, we’d rather be doing what we love; playing with data. If I had my choice on hiring the best, I would check out LinkedIn, find all the candidates I wanted and then call and verify reference before I even set up the first interview. Too many people out there pad and just flat out lie about their skills so verify everything! Sometimes I wonder if anyone follows up on anything anymore! A 10 minute phone call will stop you from figuring out how to get rid of someone that sucks, believe me, I see it happen frequently. There are some truly gifted individuals out there, don’t overlook them because you are mesmerized by your own agenda.

 

Data Science ROCKS!

What can your data do for you?

What can your data do for you?

 

 

Recently I did a webinar with Kalido and enjoyed it tremendously, they were kind enough to give me a summary of the webinar, thanks Kalido http://www.kalido.com/

My Favorite Quote from the Webinar “Big Data needs Data Science but Data Science doesn’t need Big Data” Carla Gentry aka @data_nerd

Data science has been around for decades, and it’s not just big data. I hear a lot of people clumping these two together like they go hand-in-hand, which I agree with to an extent. However, big data needs data science but data science doesn’t necessarily need big data. Most of the data a typical company handles on a daily basis or house internally is not big data. Even Facebook and Google break up or segment their data into workable pieces. Data science is big, small, structured, unstructured, messy, clean, etc… It’s more than just analytics. As a data scientist, you’ll become a liaison between the IT department and the C suite. You have to talk both languages and you have to understand the hierarchy of data, you can’t be just an architect or data expert.

 

What really matters in data science is the team effort and your role as a liaison. Your company has large amounts of data and you want to make sure your queries are correct. Whatever tool you use, make sure you have your data cleansed. You want to know that it’s normalized and indexed so that things run smoother. You want to be able to give insight, which requires knowledge of your audience. If your audience is the C suite of a multi-million dollar company, you’re going to need everything you have to back up your conclusions. Be able to prove it and be prepared for questions.

 

What sort of personality makes for an effective data scientist?

Definitely curiosity, I remember in college, my professors shut the door if they saw me coming because telling me that a2 + b2 = C2 was never enough. I wanted to know why. So the biggest question in data science is “why?” Why is this happening? If you notice that there’s a pattern, ask “why?” Is there something wrong with the data or is this an actual pattern going on? Can we conclude anything from this pattern? A natural curiosity will definitely give you a good foundation.

 

For aspiring data scientists, where can they begin?

There are many positions you can get into to learn data science; it’s not just for data engineers. Personally, I started as a junior analyst. Everyone has to start at the ground floor but there are so many resources and open-source data places you can go to practice. Most IT departments aren’t going to give you access to their live database, but they may give you access to their development database where you can go in and practice. Any position that you get into, go tell your boss that you’re interested in becoming a data scientist. Sign up for courses, learn programming languages and learn business. You have to know about budgets and various business aspects, not just the analysis part and not just the IT part. Data science is a wonderful field, and I encourage anyone that has a curiosity about data analysis, hypothesizing, statistics, to give it a shot. Just know that it won’t happen overnight.

Data Science

Data Scientist, Analytical-Solution

 

Over the years I have done quite nicely for myself as the Founder of Analytical Solution, as everyone says, wish I had done it sooner. But every once in a while I reach out to other businesses in order to do something different or I see potential I could add by joining forces with them, unfortunately each time I have tried, I have been dismiss or ignored, curious since Data Scientist are supposed to be in such demand? In all honesty, if I never added another client to my base, I would be fine financially but what fun would that be? After 2 years of “going it alone” I miss comradery and giggling because someones chair made a strange noise, giggle, you know what I mean. I am a NERD and we are not solitary creatures.

 

My most recent dissing, giggle, was from a company in Louisville, not far from me, I could have popped over, done some analysis, crunched some numbers, built a model, taught them about databases, ETL, Modeling… the possibilities are endless because I’ve been doing this for so many years but the guy just blew me off, like it was everyday that a Data Scientist / Economist / Mathematician with over 15 years of experience contacts him. No biggie, but what if he had joined forces with me?

 

Recently Big Data Republic started a Big Data 100 who to follow list, all the Data people are smiling or joining PeerIndex to raise their scores, giggle, it has actually been fun finding new data people on Twitter to follow (which was their original intent, not to exclude or make anyone upset) but I digress – the point is, I’m on this list and every time someone clicks on this site, there is my name, they can drill down by clicking my name or icon to receive more information. Imagine all the thousands of chances missed by NOT partnering with me, the info under my name could have said Founder of Analytical Solution and Partner XXXXX blah blah, you get the point. So, before you pass up an opportunity, the next time someone emails, tweets, or calls, give them a chance, you never know what could come of it :)

It’s not everyday someone with my experience wants to share the wealth of knowledge accumulated working with Fortune 100 and 500 companies, Colleges, Financial Institutions and Econometrical Consulting Firms. If you are in the Louisville area and would be interesting in speaking – send me an email to carla.gentry@analytical-solution.com (who knows, it could be the opportunity of a life time)

Being a Data Scientist

 Data Scientists

Being a “Data Scientist” Is As Much About IT As It Is Analysis by Carla Gentry, aka @Data_nerd

IBM defines the data scientist as -> A data scientist represents an evolution from the business or data analyst role.

 

The formal training is similar, with a solid foundation typically in computer science and applications, modeling, statistics, analytics and math. What sets the data scientist apart is strong business acumen, coupled with the ability to communicate findings to both business and IT leaders in a way that can influence how an organization approaches a business challenge.

 

Good data scientists will not just address business problems, they will pick the right problems that have the most value to the organization. The data scientist role has been described as “part analyst, part artist.”

 

Anjul Bhambhri, vice president of big data products at IBM, says, “A data scientist is somebody who is inquisitive, who can stare at data and spot trends. It’s almost like a Renaissance individual who really wants to learn and bring change to an organization.”…

 

A data scientist does not simply collect and report on data, but also looks at it from many angles, determines what it means, then recommends ways to apply the data.

 

Data scientists are inquisitive: exploring, asking questions, doing “what if” analysis, questioning existing assumptions and processes. Armed with data and analytical results, a top-tier data scientist will then communicate informed conclusions and recommendations across an organization’s leadership structure.

 

IBM hits the nail on the head with the above definition. Having worked with traditional data analysts as well as programmers, developers, architects, scrum masters, and data scientists — I can tell you they don’t all think alike. A data scientist could be a statistician but a statistician may not be completely ready to take on the role of data scientist, and the same goes for all the above titles as well.

 

Beth Schultz from All Analytics mentioned that we are like jacks of all trades but masters of none; I don’t completely agree with this comment, but do agree that my ETL skills are not as honed as my analysis skills, for example. My definition of the data scientist includes: knowledge of large databases and clones, slave, master, nodes, schemas, agile, scrum, data cleansing, ETL, SQL and other programming languages, presentation skills, Business Intelligence and Business Optimization — plus the ability to glean actionable insight from data. I could go on and on about what the data scientists needs to be familiar with, but the analysis part has to be mastered knowledge and not just general knowledge. If you want to separate the pretenders from the experienced in this business, ask a few questions about how data science actually works!

 

When I start working with a new data set (it doesn’t matter how much or what kind), the first question I usually ask is, what kind of servers do you own?

Why would you need to know about the servers to work with data? I ask this question so I will know what kind of load it can handle – is it going to take me 9 hours to process or 15 minutes? How many servers do you have? I ask this because if I have 4 or 5 servers, I can toggle or load balance versus having only 1 that I have to babysit.

What kind of environment will I be working in? I ask this because I need to know if they have a test environment versus a live environment, so I can play without crashing every server in the house and ticking a lot of people off. If you are working with lots of data, lower peak times or low load times are better for live, as compared to test or staging environments where you can “play” without fear. This way, you won’t “bring down the house”.

It’s a good idea for you Chief Marketing Officers (CMOs) to let your Data Scientist work in the evening hours and/or on weekends, at their homes if applicable. This, of course, requires setting up a VPN connection and it also depends on how secure the data connections are, as well as how much processing I can do before I crash them, – um, I mean, what is the speed and capacity to process? If a dial-up connection is all that’s available, forget it.

As a side note, I’ve crashed many a server in my day – how do you think I learned all this stuff? Back in the Nineties, someone would crash the mainframe and we would all head to Einstein’s Deli in Oak Park, IL but today, this might be frowned upon. But I digress, back to more IT related things.

Another handy thing to find out is how the databases are joined. By that I mean, what variables do they have in common (i.e., “primary keys”)? Are the relationships one-to-one, one-to-many, or many-to-many? Why would you ask this? Some programmers (I don’t mean this in general) don’t completely understand relational databases, especially when it comes to transactional data and data that needs to be refreshed often. You have to set up a database like you would play chess: think at least three moves ahead.

Additionally, some programmers/developers use too many JOIN statements in their scripts, which cause large amounts of iterations. Since these tend to increase run time and are not very efficient, you don’t want to be linking too many of these babies together and then running complex algorithms or scripts.

Sometimes, it’s better to start from scratch and build your own data source. When writing scripts to extract or refresh data, don’t forget a few keys things: normalize, index, pick your design based on what you know about the data and what is being requested of it.

Servers are important, and if dealing with large databases, load balance or toggle whenever possible. Also, star schema versus snowflake schema is important, so please put some serious thought into this. Ask yourself, do I need it fast or efficient? Believe me, I always pick efficient (I am a nerd, after all) but if the client needs it ASAP, then the client shall have it ASAP.

With knowledge of the client’s IT setup from a data management/quality perspective, you’ll be equipped to handle most situations you run into when dealing with data, even if the Architect and Programmer are out sick. Your professional knowledge is going to be a big help in getting the assignment or job complete.

Happy data mining and please play with data responsibly!

About the Author

During the past 16+ years, Carla Gentry has worked with Fortune 100 and 500 companies including but not limited to, Discover Financial Services, J&J, Hershey, Kraft, Kellogg’s, SCJ, McNeil and Firestone. Acting as a liaison between the IT department and the Executive staff, she is able to take huge complicated databases, decipher business needs and come back with intelligence that quantifies spending, profit and trends. Being called a data nerd is a badge of courage for this curious Mathematician/Economist because knowledge is power and companies are now acknowledging its importance. To find out more about what Carla does, please visit her profile on LinkedIn ->https://www.linkedin.com/in/datanerd13

Seems anyone can create a list of people to follow – unfortunately when I see list of Top Data people or Data Scientist, it’s always male dominated, not saying they don’t deserve it because they do, but there are many ladies out there that code their butts off and glean insight like a ROCK STAR. So, to correct this, I have compiled a list of great women in Data to follow.

 

Please follow these users for the latest in Data Science, Analysis, and Quality!

 

Karen Lopez

@datachick

 

Shelly Lucas

@pisarose

 

Big Data Gal

@BigDataGal

 

Dr Emily R Coleman

@e_r_coleman

 

Jacqueline Roberts

@JackieMRoberts

 

Gwen Thomas

@gwenthomasdgi

 

Melinda Thielbar

@mthielbar

 

April Reeve

@Datagrrl

 

Sarah Schmidt

@uptimedb2dba

 

Angela Dunn

@blogbrevity

 

Cathy O’Neil

@mathbabedotorg

 

NISS SAMSI

@NISSSAMSI

 

Isabel Elaine Allen

@DataCooker

 

Beth Schultz

@Beth_Schultz

 

Emily Carter

@EmRCarter

 

Noreen Seebacher

@writenoreen

 

Blaine Kohl

@bmkohl

 

Loretta Mahon Smith

@silverdata

 

Mandi Bishop

@MandiBPro

 

and last but not least – ME :) 15 years of crunching data and gleaning insight

Carla Gentry CSPO

@data_nerd

 

I hope you enjoy following these users as much as I do and if I forgot any Data Ladies, please accept my apologies – and send me a Tweet @data_nerd so I can start following you :)

Einstein2

 

Everywhere you look – a new program or University is teaching data science, while I think it is great for the field…… heads up people, data science is not new and creating a bunch of book smart data people without any business experience is NOT the answer. I started college in ‘93 with a group of math friends, 6 years later I graduated with a BS in Mathematics and Economics but out of 20 who started the mathematics courses at the same time I did, only 2 including me had graduated! Not trying to show off, trying to explain that logic and science are not for everyone. It’s not something that you wake up and decide, I’m going to be a Data Scientist today. Curiosity won’t help unless you have the background or talent for gleaning insight out of terabytes of data. It isn’t as easy as you hear on Twitter of Facebook; you can’t take a course and be ready to handle dirty, unstructured, messy, server breaking data.
So, what can you do? Not everyone is as lucky as I to start right out of college working for RJKA an Econometrical Consulting Firm with DFS as a client BUT there are ways to learn. http://news.stanford.edu/news/2012/september/online-courses-fall-090712.html and http://classweb.gmu.edu/kborne/ are just a couple of example, take courses, learn about MDM http://en.wikipedia.org/wiki/Master_data_management and Data Mining http://www.dataminingarticles.com/data-mining-101.html as well as Data Mining Tools http://www.theiia.org/intAuditor/itaudit/archives/2006/august/data-mining-101-tools-and-techniques/ and please, learn statistics http://search.barnesandnoble.com/Statistics-for-Business-and-Economics-James-T-McClave/e/9780132409353 and learn to write your own code (SQL will always be around) http://beginner-sql-tutorial.com/sql-joins.htm then learn about load balancing quick before you crash a multi-million dollar server and get your butt canned, giggle https://devcentral.f5.com/weblogs/macvittie/archive/2012/10/01/load-balancing-101-active-active-in-the-cloud.aspx – As you see, for those who want to invest their time and talent there are resources out there AND the BIGGEST ONE IS (ta da) Kaggle – a great place to play data scientist but please keep in mind you HAVE TO KNOW how to talk to and present findings to the C suite and or board of directors, Sheldon is a genus but self admits that he CHOKES at the thought of presenting to humans. Last by not least, find a job that works with data and learn as much as you can, I started as a junior analyst and now I own Analytical Solution! Good luck in your adventure but don’t fool yourself, it isn’t easy and if it was EVERYONE would be doing it :)

There have been a lot of articles over the years; going into great depth about “Push verses Pull Marketing” therefore I am not going to over kill the subject by regurgitating what has already been written. What I will add is that marketing has changed with the introduction of social media into our daily regimen, analysis and reporting. We try various things when we, break into the world of social media, i.e. “push verses pull” marketing, do we blast our message to the world or do we provide great content and let them come to us? In my time on social media, I have seen links stolen and articles cloned, “how many likes for this” (what?) Facebook post, pins redirected to unintended sites and auto-posting gone mad with old articles from 2010 passed around like “hot news” because no one even bothered to read the article they just wanted to re-tweet someone with high Klout. So, push marketing has definitely taken on a new aspect, it is not just about spitting out your product news to anyone out there, it about misdirection and un-professional behavior. Followers and fans figure this out pretty fast so any sales they make are a single purchase and not about building a loyal clientele. I guess if that is your purpose, you have succeeded but businesses are built on longevity not tricks. I have seen great social media campaigns which made me think, giggle, scratch my head and lean into with anticipation of their next tweet so I know there are many companies that are doing it right. Hats off to the social media teams out there that create great content that goes viral and circles the world, you inspire us to turn it up a notch! Companies like Kellogg’s want to be part of your family, they don’t blast their message to unexpected recipients or beg you to “like” their page, they drip, drip, drip, with their messages, until one days, boom, you need cereal and without even thinking, you are buying corn pops.
It takes time to build a following, good marketers know this and slowly build a loyal following through likes, re-tweets, pins, discussion on LinkedIn groups and providing thought provoking content and ads. Occasionally ads go viral but the norm is to engage and be the 1st product or service that pops into someone minds. Take me for example, I am not unique by any means, they are lots of great people out there to follow. I started my Twitter account in 2010 but didn’t bring in the business aspect until March 2011, since then I have added over 5,000 followers, not because I have a high Klout score, beg for followers or talk about my business every 5 minutes. I spend hours a day searching for relevant content to share with my fans and friend on social media, to educate, inform, entertain and spread the word about analysis, data mining, mathematics as well as good habits in analytics and social media marketing. Not because I want to just give away good content but to form a relationship, to imprint my name, whether it be @data_nerd, Carla Gentry or Analytical Solution, into your mind so when you need research, sentiment or text analysis, social media campaign and analytical marketing you’ll head over to my site and give me a call.
In conclusion, when you are creating your social media marketing plan, keep this in mind, do you want to be a household name or a spammer who “gets and loses” followers on a daily basis, the choice is yours. Blast your message to anyone who will listen or target your message, create relevant content, respect your followers, and follow up with leads in a dignified manner. Engage and show potential clients why they should do business with you, if you have expert knowledge, share with others and promote your field. No matter what strategy you use (push verse pull), showing you and your businesses character might not make you a millionaire but I guarantee it will lengthen your business career and lifetime earning potential. Thanks for taking the time to read this article and have a great day of marketing!

Carla_bigger[1]

 

A “buzz word”, that is what data has been reduced too. “Big Data” is now a common phrase used to describe numerous counts of different types of data, social media data, point of sale data, financial data, digital and visual data…. Arg, make it stop. But what is it “really” and what makes it useful versus noise?

Over the course of my career, I have worked for companies of all sizes, with some handling data better than others; the best was actually one of the smallest (go figure). Most companies struggled to figure out what to do with the data they have versus how to get more. Retail and CPG companies that can afford all the latest and greatest BI and data mining tools usually collect and use their data very well since it’s REALLY their bread and butter and without it the competition would eat them alive. Unfortunately they aren’t usually able to “house” the data, making “real time” almost impossible. Smaller companies that have jumped into the data pool (per sae) purchase large amounts of data or gather their own live data but rarely have the insight to know what “is” or “isn’t” important. Example, I sell a 250 dollar yard trimmer, now 250 bucks is a bit steep so I know the average person is not going to buy. So, I would need someone who owned (the norm) or rented and really cared (the outliner), and someone who made above average income (the norm) or someone who saved to make the purchase (since it’s a yard trimmer, we’ll say that this is an outliner) but I only have name and email address, what can I do? Honestly, not a whole lot really, except maybe a mailing list. Say I have name, complete address and email, a little better… you could use the addresses to overlay with federal, state and local data or census data from that neighbor. That would tell you median income, average home price, etc. but without more demographic and financial data, it would still not be sufficient to deduce too much insight. So the kind of data you collect becomes more important than ever, if you want to target your customers think about what it would really take for you to get the best insight.

Next issue, when working with data, one needs to think about its quality, what do I mean by that? Is it accurate and clean data? Take a look at the number of duplicate rows of information and incomplete or N/A data fields, these are very important to note and take action on. Next, how your data is labeled and defined, the “metadata” or data dictionary of your database, it tells you if the data field is a character or numeric, the length (max 255 so watch out for those “NOTE” sections), and if applicable a short description of what the variable actually is. A unique quantifier is preferred, when working with FICA/FICO, we used SSN# but in other cases usually a client ID or purchase id, which may not be unique is used. If multiple purchases or visits, with a non-unique way of labeling, occurs this can be a headache especially when working with live data and adding into the master database. Updates in a data warehouse involve data dumps or extraction, transformation and load to merge new data in with existing data (segmentation is based on some type of quantifier, a hopefully unique variable), sounds easy (not) but it gets worse, the bigger the data the longer this process takes and we haven’t even started talking about unstructured data yet, whew. How are incomplete rows beneficial, if you are looking at web data or basket sales, it can show you were someone abandoned their shopping carts, if it’s a loan application, it can tell you where they stopped, see where I’m headed? Data entry is VERY important, a few fat fingered data sets add up fast when you are talking terabytes of data, especially when they are keys in but a multitude of people.

There is more than meets the eye to data, everyone wants it but if you want it just for the sake of having data, make sure it’s not just noise, what do I mean by noise. Data experts usually take different stances on this one; I’m the, make a mental note but remove for the sake of immediate insight, (null data does not make a pretty spreadsheet) kind of person. I take special note at the end of the evaluation or data analysis but don’t freak out trying to figure out why I have 87 records that indicate the person was over 90 years old or they made 123 dollars a year, mis-entries, errors, fat fingers… no time for them now but will contact IT to correct records later (this part is very important as well, if not corrected that is 87 wasted records and they keep coming up with each analysis).
Unstructured data, what do I mean by unstructured, all data has some type of structure… yes, but take Twitter and Facebook data, it doesn’t fit into a tabular form or model but if you manipulated it (using whatever method you choose) you can still infer insight but it’s messy and sometimes a lot of useless information i.e. Joe ate a sandwich and boy was it good, giggle. Lots to think about, tools for collection of data, tools for extraction and updating data, tools for converting unstructured data into usable information, talent to glean insight out of data. Storage used to be a big deal, but now a terabyte is 50 dollars but a data warehouse or data mart will require multiple servers or a mainframe, now there’s some money. But this is enough for you to think about for now, do you still want to build that database or start a data warehouse, if so please don’t shrug it off as begin a piece of cake, to gain insight the corrects steps are to think first, collect second. Happy Mining!

Data_Nerd