Learn the what and why of automatic data collection?

Before going into “Why?” perhaps we should first answer “What?”  What exactly is automatic data collection?  This begs the following questions:

Questions concerning automatic data collection

  1. Who would want automatic data collection?
  2. How would automatic data collection take place?
  3. When would automatic data collection occur?

Each of the questions is a post in it’s own right, so we’ll just focus on briefly describing what automatic data collection is and then go into why it would of value.

What is automatic data collection?

There is the term automatic identification and data capture, AIDC.  This is the process of automatically identifying objects and then collecting data about them and having that data entered into some kind of storage such a database, table or something.  If we just consider information collection from the internet, say, then what we’re really looking at is scraping the internet for information.  What kind of information, you may ask?  Perhaps you’re  small shop with an an e-commerce portion to your shop.  Your customers can purchase products online and provide feedback in the form of comments. These comments can be feedback on service, ratings of products and more.  In any case, you want to get an idea of how your customers are rating your products so you have an idea of what you should offer next year.  Sales will show this, but sometimes it’s nice to get some ratings.  Not all sales will translate into high ratings.  Instead of going through all of the ratings for all your products online and counting which products got how many stars, say, automatic data collection can be used.  A little program to scrape this information off your own site can be created an the information dumped into a file for you to work with and explore later.  Another example could be surveying your customers and potential customers to find out what they want to buy.  Traditional surveys don’t work as well anymore.  No one has time to fill out forms, answer telephone questions etc. anymore But everyone seems to have time to spend on social media.  Perhaps there is a particular twitter account that has all the people who love handbags tweeting about the latest handbags, the worst handbags, the best handbags, the best handbag designers….I think you get the point.  The content from this particular twitter account may shed some light on which handbags to offer next season, say.  So the content from this twitter account would be the data, so the tweets.  Again, sitting there writing down all the tweets isn’t realistic so we create a little program that will scrape the tweets from that particular twitter feed and store it in a table or database of some kind so the we can do some work and analysis on it.  Two simple example of data that we want collected that can’t be collected realistically manually, that can be done through automatic methods and then stored for further analysis and work.  Now,

Why automatic data collection?

The above few examples of what is automatic data collection also provided some indication of “Why automatic data collection?” as well.    There are so many more reasons why data collection may be required.  First off, data is everywhere, so might as well start using it and in order to use it, we need to collect it.

Reasons for automatic data collection

  1. financial resource may be scarce
  2. time is becoming more scarce for everyone making it hard to collect data by hand
  3. society is becoming a very “now” society meaning that up to date information is necessary
  4. everything is very process oriented from beginning, data collection, to the end, publication requiring a way to reproduce this process efficiently

Reasons against old school data collection

  1. tiring collecting research data in a non reproducible manner
  2. that is cumbersome and
  3. prone to errors increasing the risk of death by boredom.  For humans, this third reason is a large cause of errors.

Pro for automatic data collection

  1. more reliable
  2. can be reproduced
  3. and is time-efficient as no micromanaging, breaks etc are required

Business, researcher, family, individual, we all have data around us in our lives.  In some cases it’s more clear that this data can be useful in providing insight into what is going on in a particular situation or environment as in a business setting or for a research project.  In these cases gathering the data is required before even going ahead and getting it to work for us.  Time, resources and finances may not be available to gather this data by hand so investing in some sort of automated means of gathering this data whether for a one time project or situation or continuously may be the answer.  Once the data has been gathered, now what?  To be continued!

References:

  1. Automatic identification and data capture, Wikipedia, URL: https://mail.yahoo.com/?.intl=ca&.lang=en-CA&.partner=none&.src=fp# 
  2. Automated Data collection (ADC) Basics, Piasecki, D., URL: http://www.inventoryops.com/ADC.htm
  3. Munzert, Simon; Rubba, Christian; MeiBner, Peter; Nyhuis, Cominic; [2015], Automated Data Collection with R, A Practical Guide to Web Scraping and Text Mining; John Wiley & Sons, Ltd
Lani Haque
Lani Haque

I enjoy learning and sharing that knowledge. Sharing has been in many forms over the years, as a teaching assistant, university lecturer, Pilates instructor, math tutor and just sharing with friends and family. Throughout, summarizing what I have learnt in words has always been there and continues to through blog posts, articles, video and the ever growing forms of content out there!

You May Also Like

More From Author