NETC08 session 1008

From Extension Collaborative Wiki

Jump to: navigation, search

Session Name: Extracting Information from Everywhere
Session Number: 1008
Location: Tanglewood
Day and Time: Monday, 2:15 - 3:00 pm
Format: Presentation
Topic: Web-related technologies
Level: Advanced


Session Abstract:

This discussion will focus on the ability to extract data from locations (web sites, emails, documents, etc.) that were not designed to have data extracted from them. By extracting seemingly static data from various sources you can feed it to/share it with other consumer applications who can then manipulate it and make it interactive and ultimately more useful.
This is the computer equivalent of you reading about an event in the paper and then telling your neighbor about it who then attends the event and enjoys herself immensely. You extracted the information, shared it with your neighbor, who then did something useful with it, all without having to actually read the paper herself!
For example, web scraping techniques can use common scripting tools and programming languages to allow you to pull images, weather information, tables, etc. from any web page. A web page may display data that you find useful but does not provide it in a standard format such as XML. By scraping, or parsing, the raw HTML of the page you can extract that data and use it for whatever purpose you desire...maybe storing it in a database on a daily basis and then reporting on it at the end of each month.
Also, a discussion of Apple Data Detectors which originally appeared over 10 years ago from Apple's Advanced Technology Group but quickly disappeared because it was ahead of its time. Apple Data Detectors is now in Leopard and allows you to take the semi-structured data that appears in emails, documents, web pages, etc. and turn it into something useful, such as a calendar entry in iCal.


Led By: Joe Zobkiw, Business and Technology Applications Specialist, NC State University Extension Information Technology


Personal tools