A general outline for the steps in a data project

There are a set of steps used when conducting a survey and a set of steps for a data project.   We’ll start with the steps for a conducting a survey below:

Steps for a survey

  1. Formulation of the statement of objectives
  2. selection of a survey frame
  3. determination of the sample design
  4. questionnaire design
  5. data collection
  6. data capture and coding
  7. editing and imputation
  8. estimation
  9. data analysis
  10. data dissemination
  11. documentation

What is interesting about this list of steps is that data analysis is number 9 on the list.  The two steps after the analysis is to disseminate the data and then to document the findings from the analysis.  But steps 1-8 are all  to do with formulating the problem, determining what information to collect, how to collect the data, cleaning and processing the data collection all in preparation for the “magic” of data analysis occurs.

This is something to keep in mind when getting your data to work for you.  All the steps above for a survey may not translate into a step for a data project but let’s see which might.  For a data project, some general steps to follow are below.

Steps for a data project

  1. First step would be to determine what the objective the project is and be as specific as possible.  For most businesses it is to maximize profits and minimize costs but try to be more specific.  Perhaps it’s to determine how many people are really buying online and see if that can be developed and used to its fullest potential.
  2. Then we need to determine what data we need.   If we want to determine how many people are purchasing online, we need to go to the purchase history and group the purchases into online and in-store purchases. There are probably no other kinds of purchases.   We know what data we need.
  3. Now we need to figure out where to get this data from.  Purchase history would have this, store copies of receipts would indicate whether it’s an instore or online purchase.
  4. Would there be any missing data or information? So, is it possible that a purchase would occur and how the purchase was made would not be indicated?  i.e.  no information on whether it’s an online or instore purchase is supplied.  If this is possible, what are the causes?  Human error, technological error, etc.  For any missing information can the missing information be supplied or guessed intelligently in some way?  Perhaps through other indicators on the receipt, time of day, location of purchase etc.  Perhaps the purchase was made at 1am in the morning and the store is closed at that time so clearly it must have been an online purchase.
  5. Now that we have the data, let’s do something with it.  This where the story starts to unfold and this story comes out through data analysis.  A good place to start is through some descriptive statistics, data visualizations, anything to give some more meaning to all the data.  e.g. graphs, tables, plots, diagrams, charts.  As much or a little time can be spent on this as is needed depending on what the data shows.  New questions may arise and thus need to be explored through the data.  For example, what time are all the online purchases being made?  How many of the online sales result in returns? Once the answers to some of these questions are answered, next steps and new decisions can be made by the business owner.
  6. All this information that the data tells us is then summarized into a document or report of some kind for the business owner, researcher, data owner along with any recommendations for further projects, research, data explorations and more.

Summary of steps for a data project:

  1. Determine the objective of data project, be specific.
  2. Determine data needed.
  3. Determine where to get this data from.
  4. Determine if there is missing data.
  5. If data is missing, how to go about filling in the missing data.
  6. Start telling a story with the data.
  7. Summarize the story we have found through the data and suggest other possible avenues of exploration and stories.

References:

Survey Methods and Practices, Statistics Canada, Ministry of Industry, 2010, URL: http://www.statcan.gc.ca/pub/12-587-x/12-587-x2003001-eng.pdf

Lani Haque
Lani Haque

I enjoy learning and sharing that knowledge. Sharing has been in many forms over the years, as a teaching assistant, university lecturer, Pilates instructor, math tutor and just sharing with friends and family. Throughout, summarizing what I have learnt in words has always been there and continues to through blog posts, articles, video and the ever growing forms of content out there!

You May Also Like

More From Author