Wednesday, December 16, 2015

Spark use cases

Signup for free IBM Bluemix cloud and use Spark Service to try these out. These are excellent use cases for beginners. After that you can move on to developing complex algorithm and run on larger data sets and leverage the true distributed computing power of Spark. 
1. Use train wreck datasets http://www.trainwreckdb.com/ (Links to an external site.) to allow user to query the accidents based on keywords such as Bicycle, pedestrian 
2. Use NHTSA data to allow user to query fatalities based on some keyword http://www.nhtsa.gov/FARS (Links to an external site.) 
3. Use virtual sensor app http://virtualsensors.mybluemix.net/ (Links to an external site.) to generate gps location and wind speed to predict fire spread 
4. Use Spark Streaming to learn network conditions in real time, use http://virtualsensors.mybluemix.net/ (Links to an external site.) app to generate network data 
5. Use spark streaming to get temperature and humidity data (generate using http://virtualsensors.mybluemix.net/ (Links to an external site.) ) and do real time optimization/alrt 
6. Use airquality data http://www3.epa.gov/airdata/ad_data.html (Links to an external site.) and send alert based on my zipcode 
7. personalization of news briefs - use twitter trend to personalize what news I might like and read -
9. flu detection by analyzing social signals like tweets 
10. auto adjustment of game level complexity for player retention - Hint: you can simulate game player data using virtual sensor app and use spark to lower or upper the game level 
11. employee data analysis for employee retention (you can use glassdoor data for this prototype) 
12. fraud detection in user reviews (use this data: https://snap.stanford.edu/data/web-Amazon.html (Links to an external site.) ) 
13. detect security breaches or attacks in real time (simulate network breach using blacklisted IPs) 
15. Use FAA bird strike data and spark  to allow users to query most dangerous airports http://wildlife.faa.gov/database.aspx (Links to an external site.) 
16. Use home energy usage datasets http://redd.csail.mit.edu/ (Links to an external site.) as training set to find out how your own home energy usage levels up. Hint: user will enter his own energy usage and app will provide comparative analysis. 
17. Use baby names datasets to come up on some cool analytic spark queries: https://www.ssa.gov/oact/babynames/limits.html (Links to an external site.) (Hint: user finds out how boy or girl names are trending since 2005) 
18. Use NIST vulnerability datasets to predict possible DDoS attack https://nvd.nist.gov/download.cfm (Links to an external site.)
19. use water and sanitation data http://www.data.unicef.org/water-sanitation/sanitation.html (Links to an external site.) and spark to bring analytics such as which countries is worst / which one is improving.. 
20. use children mortality data http://www.unicef.org/statistics/index_countrystats.html (Links to an external site.) to figure out what are the most important causes for under 5 yr old kids die.. 
21. Use 2012 presidential donation datasets to find out who donated to presidential candidates most and if there is any corelation..http://www.fec.gov/finance/2012matching/2012matching.shtml (Links to an external site.) 
 22. Use Music reviews data http://jmcauley.ucsd.edu/data/amazon/ (contact the owner for data link..) and use spark to find the top ten songs or something.. 
23. Use air quality daily data for a given geography (such as San Jose Sunnyvale) and use spark to generate anomaly in the data to important some health and safety questions http://www3.epa.gov/airdata/ad_data_daily.html (Links to an external site.)
24. Use medicare outpatient payment datasets https://data.cms.gov/Medicare/Outpatient-Prospective-Payment-System-OPPS-Provide/ks44-5ax3 (Links to an external site.) and use spark to find out interesting answers such as: which city had max payments claimed. Which provider had repeated claims etc.. 
25. Use popular baby names datasets https://www.ssa.gov/oact/babynames/rankchange.html (Links to an external site.) and spark to predict what will be the most popular male and female names for 2016 
26. Use interesting Genome and proteins data sets from http://www.ncbi.nlm.nih.gov/home/download.shtml (Links to an external site.) and use spark to calculate interesting facts (Hint: use clinVar datasets to find out all gene types related to conditions: "Breast-Ovarian cancer" 
27. Use housing affordability datasets with Spark to come up on good analytics - which city and zip code has good overall job opportunity and housing affordability for 30-40 year old http://catalog.data.gov/dataset/housing-affordability-data-system-hads (Links to an external site.) 
28. Use farmers market location datasets and spark to generate some interesting analytics http://catalog.data.gov/dataset/farmers-markets-geographic-data (Links to an external site.)

29. Use Govt real estate asset datasets: http://catalog.data.gov/dataset/real-estate-across-the-united-states-rexus-inventory-building (Links to an external site.) and spark to come up on some cool analytics such as how much money govt is spending on maintaining useless assets etc.. 
30. Use death cause datasets http://catalog.data.gov/dataset/leading-causes-of-death-by-zip-code-1999-2013 (Links to an external site.) and spark to answer health related questions 
32. Use air quality data http://www3.epa.gov/airdata/ad_data.html (Links to an external site.) and bring the factors that are contributing in  a particular geo location. feel free to use map app 
 33. Use NASA datasets (for ex. https://data.nasa.gov/view/scmi-np9r (Links to an external site.) ) and use spark to come up on some interesting answers 
34. Use NY open data of your choice https://nycopendata.socrata.com (Links to an external site.) and bring some good analytics 
 35. Use Austin Restaurant inspection report data and spark to get answer to critical consumer questions such as which cuisine or which area rest has more violations in past year etc... https://data.austintexas.gov/dataset/Restaurant-Inspection-Scores/ecmv-9xxi (Links to an external site.)
36. use openspending API and spark streaming http://community.openspending.org/help/conventions/ (Links to an external site.) to find out which govt is spending taxpayers money efficiently 
38. Use transportation datasets and spark to answer some cool analytic questions (be creative) https://aws.amazon.com/datasets/transportation-databases/?tag=datasets%23keywords%23economics (Links to an external site.) 
 40. Use lobbying database from http://www.senate.gov/legislative/Public_Disclosure/database_download.htm (Links to an external site.) (hint: get quarterly data , convert from xml to JSON or whatever spark context can read) and then design some analytic program to find out which state or which industry is lobbying heavily
41. Use SAT score datasets and spark to come up on some cool analytics http://www.cde.ca.gov/ds/sp/ai/ (Links to an external site.) 
42.Use train wreck datasets http://www.trainwreckdb.com/ with spark to figure out what are the 10 most dangerous places for accidents and why 

3 comments:

  1. I think this is an great blogs. Such a very informative and creative contents. These concept is good for these knowledge.I like it and help me to development very well.Thank you for this brief explanations.
    Analytics Training In Chennai

    ReplyDelete
  2. The blog is so interactive and Informative , i Request you to write more blogs like this Data Science Online course

    ReplyDelete
  3. Use case diagram is a behavioral UML diagram type and frequently used to analyze various systems. They enable you to visualize the different types of roles in a system and how those roles interact with the system. Drawing use case are easier with use case templates . You can find them to use freely in the diagram community of Creately.

    ReplyDelete

Make Everyone Smile

Hey there! Just wanted to let you know that today is officially National 'Make Everyone Smile' Day! So, consider yourself officially...