Monday, September 26, 2016

Solving today's challenging problems using breakthrough technology

Its time for team based project for my Fall class at SJSU again. It requires a great deal of passion and team effort to do something that has not done before. I always encourage my students to think big and solve a problem which has not been solved before. And trust me, there are plenty of problems which might have solutions but either they are not uniquely addressing the problem or they can't be afforded by the masses. Anyway, here again I am posting some ideas that you can research and brainstorm to come up on a problem statement.

1. Barcode of Life (DNA Barcoding)

DNA barcoding is a molecular-based identification system that aims to identify biological specimens, and to assign them to a given species. DNA based identification for multi cellular life is an emerging concept and scientists from 25 countries have got together to create a database of DNA barcodes (BOLD) to solve multi-faceted problems. 
Here are some potential use cases:
a) Timber and wild life trafficking
b) Traceability of food pipeline for food safety
c) more..

2. Blockchain:

Blockchain is a technology for a new generation of transactional applications that establishes trust, accountability and transparency while streamlining business processes. It is a design pattern made famous by bitcoin, but its uses go far beyond. It has the potential to vastly reduce the cost and complexity of cross-enterprise business processes. The application of this emerging technology is showing great promise across a broad range of business applications.
Blockchain technology example



Here are some potential use cases:
a) Tracking the origin and movement of a high value item across a supply chain. Think of Money transfer as a simple example. Counterfeit drugs or high priced fashion items are other examples. 
b) Mechanism for collectively record keeping and notarizing any type of data financial or otherwise. This is not necessarily about a physical asset. 
c) explore non-financial use cases of Block chain 

3. Hyperspectral Image Analysis:

Hyperspectral imagery consists of much narrower bands (10-20 nm). A hyperspectral image could have hundreds of thousands of bands. This uses an imaging spectrometer. 

Here is an excellent read: http://gisgeography.com/multispectral-vs-hyperspectral-imagery-explained/

This is an emerging technology being used in several domain including:
a) Smart Farm irrigation techniques http://gamaya.com/
b) surveillance and reconnaissance

4. Conversational Interfaces:

Voice interfaces have come a long way but Thanks to the advances in Machine learning voice control has become much more practical. This eliminates the need for hundreds of different interfaces which will be hard to develop and keep up with the users. 

5. Application of Crowd funding into brand new space

6. IoT and Analytics for Better life

IoT is inside our home and offices already, from Nest to 24x7 monitoring of indoor conditions. Here is one company sells device for making you aware on the indoor air quality.  How about developing an IoT solution which can monitor overall "mood" indoor and suggest people to do something better for their health and happiness. This can be monitored by camera and activities in the house such as movement tracking, smile on faces, crying kids vs happy kids, yelling, of course indoor air and noise quality, temperature etc...

Augmented Tools for construction. Read this paper: https://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-90.pdf

This idea can be applied to healthcare, interior decoration, farming and more.. 






Wednesday, January 27, 2016

What can $5 do for you?


  1. You can hire an expert to wonderful things for you for $5 at fiverr 
  2. You can buy a new computer capable of running Linux called piZero for $5
  3. An arduino micro controller board on eBay for $5
  4. Subscribe entire year 12 issues of Wired magazine for $5
  5. can get 4 gallons of gas in your car for $5 cheapest gas in the country
  6. watch a movie on Tuesday for $5 
  7. Your $5 donation to a local food bank can buy up to 50 lbs of food
  8. can get you a hair cut if you live in Philadelphia
  9. hey save your $5 bills whenever you get it...very soon you will have $100
  10. Share your $5 idea please!

Sunday, January 3, 2016

Creating win-win situation

Let me start by sharing my new year resolution because its a tradition :)
All passionate people like me fight at work, at home and when we fight, someone looses. Looser may not be the two persons fighting but the cause itself. Project suffers, your colleague looses morale and business does not achieve results. So I am going to find ways to create situations where everybody wins. I am working on how part and will share when I have concrete ideas.

For now, the reflections on the past year. Professionally, 2015 was exciting, fast, and fruitful. At work, I led and drove many major initiatives for transforming our data and analytics business into cloud. dashDB has been a huge success. Seeing it grow in terms of values it provide to data analysts and data scientists and the customer base, it always reminds me the day 0 when few of us created this cloud offering focusing completely on users. R analytics always has been the value for both quick productive use and long term machine learning models and being part of the team who brought this is very satisfying.
Bluemix itself has surpassed the user base beyond 1 million and I am extremely proud of being part of the eco system. We fueled the growth of Bluemix by providing what I call "engine of PaaS". These engines include Relational and NOSQL databases as service, modern analytic compute engine like Spark; our leaders call it an Analytic Operating system and BigData analytic services such as EHaaS.
In order to bring these services in Bluemix, there were tons of internal projects which I designed and created from scratch. These projects include developing a continuous integration and deployment tool for our cloud delivery, reliability and resiliency framework for change management and tools for managing outages and root cause analysis. I started to foster a culture within the team to think about resiliency as a big picture and focused on all 4 aspects of it shown below.

2015 was a great year for my own learning and technical vitality. I learnt the very important characteristics of the cloud services, expectations of our customers and pain points in operating the services at scale. It was a year to lay the foundation for bringing services into cloud quickly and improve rapidly while reducing the cost of operation. Few skills that I acquired/enhanced stand out such as ELK stack for log analysis, of course my favorite R and R-studio for statistical analysis and machine learning, Docker for containerization, Ansible for change management, Jupyter notebook for data analysis and Spark for analytic computing.
2015 was a great year for my teaching profession at San Jose State. I was able to update the syllabus for my students so that they get exposed to the skills and technology required by innovative companies in the bay area. Spark, Docker and Reactive programming using Typesafe technology were great additions. Students greatly benefited from this class being part of IBM university relations for Bluemix. They used Bluemix to understand the cloud technology and used to implement good ideas they had. I am very confident that the experience they acquired in the class will give them tools and confidence to succeed. The projects they did to analyze data using Spark (take a look at some profiled here) will help them at their next job. In addition to teaching this class, I supervised more than 20 students in 2015 for their Masters projects. 
2015 was also a great year for my speaking and presenting experiences. Starting the year with presenting at IBM Interconnect (our premier cloud conference) and continued with presenting at more than 10 conferences. Silicon Valley Code camp was one of my favorites as it attracts students and developers all across bay area and also the two day events are free to attend. Presenting at Oreilly Velocity conference was a privilege and I learnt a lot from listening to some of the best speakers in the industry. Sharing my own ideas and work on Predictive monitoring and analytics was a very enriching experience.  In addition to many Hackathons and coding camps, the year ended well with presenting at Insight conference. This is where we learn in great detail, what our enterprise customer wants from us. The very nature of hybrid cloud topology of their evolving businesses teaches us a lot. 
On career front, I am very thankful for the opportunities that were provided to me. 2015 was also rewarding as new title and responsibilities allowed me to understand myself more. Leading a big team, aligning the business goals of the organization to people's day job and fostering the culture where everyone is intrinsically motivated and passionate is fun but challenging. I believe in high performance culture and self driven team and being surrounded by highly smart people is something I was fortunate to have. I am also fortunate to have leaders and mentors at work who are as passionate as me and make our day job fun and challenging. 
On personal front, 2015 was an interesting year as well. My mom retired after teaching at high school in my home town in India for last 34 years. We all were relieved that she got break but I figured out she was sad leaving the community that she served for so long. I was able to spend time with my kids as much as I wanted in spite of the chaotic nature of work. I will never forget the poster on the door of my great mentor Bala Iyer's IBM office - "Good fathers and good men". I consider myself successful in seeding the engineering(making) concept in my daughter's brain. She was able to take a burning problem and come up on a technological solution in the form of her school science project and achieved good heights. She surprised me when she kept winning school, county, state and went up to national semi final. We were able to take a couple of major vacations in Europe and North east USA which were very memorable and relaxing.
Last but not least, ended the year with reading a book - Without their permission by Reddit founder Alexis Ohanian. I love this guy not just because he was able to make millions of dollars at very young age, but because he explained in clear terms how internet allows all of us to contribute to the society and bring the changes that we all want to see. While relaxing at home at the end of the year, I was fortunate to meet one of my home town friend and mentor after 17 years (even though he lives in Indiana, we somehow missed seeing each other all these years) He had a huge impact on my early stage career and I always admired him for his attitude and passion towards life. 
Everything happens at right time and we must be optimistic. I would like to end this post with wishing all my readers, friends and mentors a very resilient new year!


Wednesday, December 16, 2015

Spark use cases

Signup for free IBM Bluemix cloud and use Spark Service to try these out. These are excellent use cases for beginners. After that you can move on to developing complex algorithm and run on larger data sets and leverage the true distributed computing power of Spark. 
1. Use train wreck datasets http://www.trainwreckdb.com/ (Links to an external site.) to allow user to query the accidents based on keywords such as Bicycle, pedestrian 
2. Use NHTSA data to allow user to query fatalities based on some keyword http://www.nhtsa.gov/FARS (Links to an external site.) 
3. Use virtual sensor app http://virtualsensors.mybluemix.net/ (Links to an external site.) to generate gps location and wind speed to predict fire spread 
4. Use Spark Streaming to learn network conditions in real time, use http://virtualsensors.mybluemix.net/ (Links to an external site.) app to generate network data 
5. Use spark streaming to get temperature and humidity data (generate using http://virtualsensors.mybluemix.net/ (Links to an external site.) ) and do real time optimization/alrt 
6. Use airquality data http://www3.epa.gov/airdata/ad_data.html (Links to an external site.) and send alert based on my zipcode 
7. personalization of news briefs - use twitter trend to personalize what news I might like and read -
9. flu detection by analyzing social signals like tweets 
10. auto adjustment of game level complexity for player retention - Hint: you can simulate game player data using virtual sensor app and use spark to lower or upper the game level 
11. employee data analysis for employee retention (you can use glassdoor data for this prototype) 
12. fraud detection in user reviews (use this data: https://snap.stanford.edu/data/web-Amazon.html (Links to an external site.) ) 
13. detect security breaches or attacks in real time (simulate network breach using blacklisted IPs) 
15. Use FAA bird strike data and spark  to allow users to query most dangerous airports http://wildlife.faa.gov/database.aspx (Links to an external site.) 
16. Use home energy usage datasets http://redd.csail.mit.edu/ (Links to an external site.) as training set to find out how your own home energy usage levels up. Hint: user will enter his own energy usage and app will provide comparative analysis. 
17. Use baby names datasets to come up on some cool analytic spark queries: https://www.ssa.gov/oact/babynames/limits.html (Links to an external site.) (Hint: user finds out how boy or girl names are trending since 2005) 
18. Use NIST vulnerability datasets to predict possible DDoS attack https://nvd.nist.gov/download.cfm (Links to an external site.)
19. use water and sanitation data http://www.data.unicef.org/water-sanitation/sanitation.html (Links to an external site.) and spark to bring analytics such as which countries is worst / which one is improving.. 
20. use children mortality data http://www.unicef.org/statistics/index_countrystats.html (Links to an external site.) to figure out what are the most important causes for under 5 yr old kids die.. 
21. Use 2012 presidential donation datasets to find out who donated to presidential candidates most and if there is any corelation..http://www.fec.gov/finance/2012matching/2012matching.shtml (Links to an external site.) 
 22. Use Music reviews data http://jmcauley.ucsd.edu/data/amazon/ (contact the owner for data link..) and use spark to find the top ten songs or something.. 
23. Use air quality daily data for a given geography (such as San Jose Sunnyvale) and use spark to generate anomaly in the data to important some health and safety questions http://www3.epa.gov/airdata/ad_data_daily.html (Links to an external site.)
24. Use medicare outpatient payment datasets https://data.cms.gov/Medicare/Outpatient-Prospective-Payment-System-OPPS-Provide/ks44-5ax3 (Links to an external site.) and use spark to find out interesting answers such as: which city had max payments claimed. Which provider had repeated claims etc.. 
25. Use popular baby names datasets https://www.ssa.gov/oact/babynames/rankchange.html (Links to an external site.) and spark to predict what will be the most popular male and female names for 2016 
26. Use interesting Genome and proteins data sets from http://www.ncbi.nlm.nih.gov/home/download.shtml (Links to an external site.) and use spark to calculate interesting facts (Hint: use clinVar datasets to find out all gene types related to conditions: "Breast-Ovarian cancer" 
27. Use housing affordability datasets with Spark to come up on good analytics - which city and zip code has good overall job opportunity and housing affordability for 30-40 year old http://catalog.data.gov/dataset/housing-affordability-data-system-hads (Links to an external site.) 
28. Use farmers market location datasets and spark to generate some interesting analytics http://catalog.data.gov/dataset/farmers-markets-geographic-data (Links to an external site.)

29. Use Govt real estate asset datasets: http://catalog.data.gov/dataset/real-estate-across-the-united-states-rexus-inventory-building (Links to an external site.) and spark to come up on some cool analytics such as how much money govt is spending on maintaining useless assets etc.. 
30. Use death cause datasets http://catalog.data.gov/dataset/leading-causes-of-death-by-zip-code-1999-2013 (Links to an external site.) and spark to answer health related questions 
32. Use air quality data http://www3.epa.gov/airdata/ad_data.html (Links to an external site.) and bring the factors that are contributing in  a particular geo location. feel free to use map app 
 33. Use NASA datasets (for ex. https://data.nasa.gov/view/scmi-np9r (Links to an external site.) ) and use spark to come up on some interesting answers 
34. Use NY open data of your choice https://nycopendata.socrata.com (Links to an external site.) and bring some good analytics 
 35. Use Austin Restaurant inspection report data and spark to get answer to critical consumer questions such as which cuisine or which area rest has more violations in past year etc... https://data.austintexas.gov/dataset/Restaurant-Inspection-Scores/ecmv-9xxi (Links to an external site.)
36. use openspending API and spark streaming http://community.openspending.org/help/conventions/ (Links to an external site.) to find out which govt is spending taxpayers money efficiently 
38. Use transportation datasets and spark to answer some cool analytic questions (be creative) https://aws.amazon.com/datasets/transportation-databases/?tag=datasets%23keywords%23economics (Links to an external site.) 
 40. Use lobbying database from http://www.senate.gov/legislative/Public_Disclosure/database_download.htm (Links to an external site.) (hint: get quarterly data , convert from xml to JSON or whatever spark context can read) and then design some analytic program to find out which state or which industry is lobbying heavily
41. Use SAT score datasets and spark to come up on some cool analytics http://www.cde.ca.gov/ds/sp/ai/ (Links to an external site.) 
42.Use train wreck datasets http://www.trainwreckdb.com/ with spark to figure out what are the 10 most dangerous places for accidents and why 

Monday, October 12, 2015

Where do I get public data sets and APIs for my project?

  1. http://www.data.gov/
  2. http://www.socrata.com/resources/
  3. https://data.sfgov.org/
  4. http://www.unicef.org/statistics/
  5. http://www.who.int/gho/en/
  6. http://www.census.gov/
  7. http://www.programmableweb.com/
  8. http://www.infochimps.com/
  9. http://www.google.com/publicdata/directory
  10. http://www.weatherbase.com/
  11. http://espn.go.com/apis/devcenter/
  12. http://statistics.ucla.edu/
  13. http://developer.nytimes.com/docs
  14. http://have-you-been-here.appointment.at/
  15. http://www.trainwreckdb.com/
  16. https://data.gov.uk/
  17. http://www.freebase.com/
  18. https://okfn.org/
  19. http://sunlightfoundation.com/api/
  20. http://dev.markitondemand.com/MODApis/
  21. http://www.wunderground.com/weather/api
  22. https://developers.google.com/maps/documentation/geolocation/intro
  23. http://www.datasciencetoolkit.org/
  24. http://openweathermap.org/api
  25. https://www.publicapis.com/
  26. https://data.nasa.gov/developer
  27. https://aws.amazon.com/public-data-sets/
  28. https://www.ncdc.noaa.gov/snow-and-ice/daily-snow/
  29. https://www.grid.ac/downloads
  30. http://nyctaxi.herokuapp.com/
  31. http://shootingtracker.com/wiki/Main_Page
  32. http://download.geonames.org/export/zip/
  33. http://2015.padjo.org/briefs/tracking-police-involved-homicides/#datasets
  34. http://www.slavevoyages.org/tast/assessment/estimates.faces
  35. http://www.nhtsa.gov/FARS
  36. http://classwork.compjour.org/2015/jeffbarrera/bikecrashmapper/
  37. http://that1archive.neocities.org/archives-datasets.html
  38. http://www3.epa.gov/airdata/ad_data.html
  39. http://www.dc.state.fl.us/pub/obis_request.html
  40. http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=On-Time
  41. Road fatality: http://www-odi.nhtsa.dot.gov/downloads/
  42. Every campaign donation to a US federal candidate: http://www.fec.gov/disclosure.shtml
  43. Dallas Police Incident Report: https://www.dallasopendata.com/Police/Dallas-Police-Public-Data-RMS-Incidents/tbnj-w5hb
  44. LAPD Crime report: https://data.lacity.org/A-Safe-City/LAPD-Crime-and-Collision-Raw-Data-2014/eta5-h8qx?
  45. Chicago crime report: https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2
  46. Medicare payments to every doctor: https://www.cms.gov/research-statistics-data-and-systems/statistics-trends-and-reports/medicare-provider-charge-data/physician-and-other-supplier.html
  47. Congressional lobbyists: http://www.senate.gov/legislative/Public_Disclosure/database_download.htm
  48. Everyone visited white house since 2009: https://www.whitehouse.gov/briefing-room/disclosures/visitor-records
  49. NYPD - stop, question and frisk data: http://www.nyc.gov/html/nypd/html/analysis_and_planning/stop_question_and_frisk_report.shtml
  50. workplace fatality: http://ogesdw.dol.gov/views/data_catalogs.php
  51. CA SAT, ACT AP scores: http://www.cde.ca.gov/ds/sp/ai/
  52. CA school test scores: http://star.cde.ca.gov/starresearchfiles.asp
  53. CA school demographies: http://www.cde.ca.gov/ds/sd/
  54. Earthquake events: http://earthquake.usgs.gov/earthquakes/search/
  55. Payments made by companies to doctors: https://www.cms.gov/openpayments/
  56. college scorecard data: https://collegescorecard.ed.gov/data/
  57. Baby names: http://www.ssa.gov/oact/babynames/limits.html
  58. NIST vulnerability databases: https://nvd.nist.gov/download.cfm
  59. Bird strike database: http://wildlife.faa.gov/
  60. Energy disaggregation dataset: http://redd.csail.mit.edu/
  61. webclick data: http://cnets.indiana.edu/groups/nan/webtraffic/click-dataset/
  62. Economic data: http://bea.gov