Signup for free IBM Bluemix cloud and use Spark Service to try these out. These are excellent use cases for beginners. After that you can move on to developing complex algorithm and run on larger data sets and leverage the true distributed computing power of Spark.
1. Use train wreck datasets http://www.trainwreckdb.com/ to allow user to query the accidents based on keywords such as Bicycle, pedestrian
2. Use NHTSA data to allow user to query fatalities based on some keyword http://www.nhtsa.gov/FARS
3. Use virtual sensor app http://virtualsensors.mybluemix.net/ to generate gps location and wind speed to predict fire spread
4. Use Spark Streaming to learn network conditions in real time, use http://virtualsensors.mybluemix.net/ app to generate network data
5. Use spark streaming to get temperature and humidity data (generate using http://virtualsensors.mybluemix.net/ ) and do real time optimization/alrt
6. Use airquality data http://www3.epa.gov/airdata/ad_data.html and send alert based on my zipcode
7. personalization of news briefs - use twitter trend to personalize what news I might like and read -
8. earthquake detection using USGS data http://earthquake.usgs.gov/earthquakes/search/
9. flu detection by analyzing social signals like tweets
10. auto adjustment of game level complexity for player retention - Hint: you can simulate game player data using virtual sensor app and use spark to lower or upper the game level
11. employee data analysis for employee retention (you can use glassdoor data for this prototype)
12. fraud detection in user reviews (use this data: https://snap.stanford.edu/data/web-Amazon.html )
13. detect security breaches or attacks in real time (simulate network breach using blacklisted IPs)
14. analyze medicare payment data to detect fraud (use the data: https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Physician-and-Other-Supplier2013.html )
15. Use FAA bird strike data and spark to allow users to query most dangerous airports http://wildlife.faa.gov/database.aspx
16. Use home energy usage datasets http://redd.csail.mit.edu/ as training set to find out how your own home energy usage levels up. Hint: user will enter his own energy usage and app will provide comparative analysis.
17. Use baby names datasets to come up on some cool analytic spark queries: https://www.ssa.gov/oact/babynames/limits.html (Hint: user finds out how boy or girl names are trending since 2005)
18. Use NIST vulnerability datasets to predict possible DDoS attack https://nvd.nist.gov/download.cfm
19. use water and sanitation data http://www.data.unicef.org/water-sanitation/sanitation.html and spark to bring analytics such as which countries is worst / which one is improving..
20. use children mortality data http://www.unicef.org/statistics/index_countrystats.html to figure out what are the most important causes for under 5 yr old kids die..
21. Use 2012 presidential donation datasets to find out who donated to presidential candidates most and if there is any corelation..http://www.fec.gov/finance/2012matching/2012matching.shtml
22. Use Music reviews data http://jmcauley.ucsd.edu/data/amazon/ (contact the owner for data link..) and use spark to find the top ten songs or something..
23. Use air quality daily data for a given geography (such as San Jose Sunnyvale) and use spark to generate anomaly in the data to important some health and safety questions http://www3.epa.gov/airdata/ad_data_daily.html
24. Use medicare outpatient payment datasets https://data.cms.gov/Medicare/Outpatient-Prospective-Payment-System-OPPS-Provide/ks44-5ax3 and use spark to find out interesting answers such as: which city had max payments claimed. Which provider had repeated claims etc..
25. Use popular baby names datasets https://www.ssa.gov/oact/babynames/rankchange.html and spark to predict what will be the most popular male and female names for 2016
26. Use interesting Genome and proteins data sets from http://www.ncbi.nlm.nih.gov/home/download.shtml and use spark to calculate interesting facts (Hint: use clinVar datasets to find out all gene types related to conditions: "Breast-Ovarian cancer"
27. Use housing affordability datasets with Spark to come up on good analytics - which city and zip code has good overall job opportunity and housing affordability for 30-40 year old http://catalog.data.gov/dataset/housing-affordability-data-system-hads
28. Use farmers market location datasets and spark to generate some interesting analytics http://catalog.data.gov/dataset/farmers-markets-geographic-data
29. Use Govt real estate asset datasets: http://catalog.data.gov/dataset/real-estate-across-the-united-states-rexus-inventory-building and spark to come up on some cool analytics such as how much money govt is spending on maintaining useless assets etc..
30. Use death cause datasets http://catalog.data.gov/dataset/leading-causes-of-death-by-zip-code-1999-2013 and spark to answer health related questions
31. Finding fraud in section 8 housing aid from govt: http://catalog.data.gov/dataset/fair-market-rents-for-the-section-8-housing-assistance-payments-program
32. Use air quality data http://www3.epa.gov/airdata/ad_data.html and bring the factors that are contributing in a particular geo location. feel free to use map app
33. Use NASA datasets (for ex. https://data.nasa.gov/view/scmi-np9r ) and use spark to come up on some interesting answers
34. Use NY open data of your choice https://nycopendata.socrata.com and bring some good analytics
35. Use Austin Restaurant inspection report data and spark to get answer to critical consumer questions such as which cuisine or which area rest has more violations in past year etc... https://data.austintexas.gov/dataset/Restaurant-Inspection-Scores/ecmv-9xxi
36. use openspending API and spark streaming http://community.openspending.org/help/conventions/ to find out which govt is spending taxpayers money efficiently
37. Use wikipedia page traffic data https://aws.amazon.com/datasets/wikipedia-page-traffic-statistic-v3/?_encoding=UTF8&jiveRedirect=1 and spark to do some cool analytics
38. Use transportation datasets and spark to answer some cool analytic questions (be creative) https://aws.amazon.com/datasets/transportation-databases/?tag=datasets%23keywords%23economics
39. Use Human Microbiome datasets and spark to analyze https://aws.amazon.com/datasets/human-microbiome-project/?tag=datasets%23keywords%23biology
40. Use lobbying database from http://www.senate.gov/legislative/Public_Disclosure/database_download.htm (hint: get quarterly data , convert from xml to JSON or whatever spark context can read) and then design some analytic program to find out which state or which industry is lobbying heavily
41. Use SAT score datasets and spark to come up on some cool analytics http://www.cde.ca.gov/ds/sp/ai/
42.Use train wreck datasets http://www.trainwreckdb.com/with spark to figure out what are the 10 most dangerous places for accidents and why
I think this is an great blogs. Such a very informative and creative contents. These concept is good for these knowledge.I like it and help me to development very well.Thank you for this brief explanations.
ReplyDeleteAnalytics Training In Chennai
The blog is so interactive and Informative , i Request you to write more blogs like this Data Science Online course
ReplyDeleteUse case diagram is a behavioral UML diagram type and frequently used to analyze various systems. They enable you to visualize the different types of roles in a system and how those roles interact with the system. Drawing use case are easier with use case templates . You can find them to use freely in the diagram community of Creately.
ReplyDelete