Wednesday, January 27, 2016

What can $5 do for you?

  1. You can hire an expert to wonderful things for you for $5 at fiverr 
  2. You can buy a new computer capable of running Linux called piZero for $5
  3. An arduino micro controller board on eBay for $5
  4. Subscribe entire year 12 issues of Wired magazine for $5
  5. can get 4 gallons of gas in your car for $5 cheapest gas in the country
  6. watch a movie on Tuesday for $5 
  7. Your $5 donation to a local food bank can buy up to 50 lbs of food
  8. can get you a hair cut if you live in Philadelphia
  9. hey save your $5 bills whenever you get it...very soon you will have $100
  10. Share your $5 idea please!

Sunday, January 3, 2016

Creating win-win situation

Let me start by sharing my new year resolution because its a tradition :)
All passionate people like me fight at work, at home and when we fight, someone looses. Looser may not be the two persons fighting but the cause itself. Project suffers, your colleague looses morale and business does not achieve results. So I am going to find ways to create situations where everybody wins. I am working on how part and will share when I have concrete ideas.

For now, the reflections on the past year. Professionally, 2015 was exciting, fast, and fruitful. At work, I led and drove many major initiatives for transforming our data and analytics business into cloud. dashDB has been a huge success. Seeing it grow in terms of values it provide to data analysts and data scientists and the customer base, it always reminds me the day 0 when few of us created this cloud offering focusing completely on users. R analytics always has been the value for both quick productive use and long term machine learning models and being part of the team who brought this is very satisfying.
Bluemix itself has surpassed the user base beyond 1 million and I am extremely proud of being part of the eco system. We fueled the growth of Bluemix by providing what I call "engine of PaaS". These engines include Relational and NOSQL databases as service, modern analytic compute engine like Spark; our leaders call it an Analytic Operating system and BigData analytic services such as EHaaS.
In order to bring these services in Bluemix, there were tons of internal projects which I designed and created from scratch. These projects include developing a continuous integration and deployment tool for our cloud delivery, reliability and resiliency framework for change management and tools for managing outages and root cause analysis. I started to foster a culture within the team to think about resiliency as a big picture and focused on all 4 aspects of it shown below.

2015 was a great year for my own learning and technical vitality. I learnt the very important characteristics of the cloud services, expectations of our customers and pain points in operating the services at scale. It was a year to lay the foundation for bringing services into cloud quickly and improve rapidly while reducing the cost of operation. Few skills that I acquired/enhanced stand out such as ELK stack for log analysis, of course my favorite R and R-studio for statistical analysis and machine learning, Docker for containerization, Ansible for change management, Jupyter notebook for data analysis and Spark for analytic computing.
2015 was a great year for my teaching profession at San Jose State. I was able to update the syllabus for my students so that they get exposed to the skills and technology required by innovative companies in the bay area. Spark, Docker and Reactive programming using Typesafe technology were great additions. Students greatly benefited from this class being part of IBM university relations for Bluemix. They used Bluemix to understand the cloud technology and used to implement good ideas they had. I am very confident that the experience they acquired in the class will give them tools and confidence to succeed. The projects they did to analyze data using Spark (take a look at some profiled here) will help them at their next job. In addition to teaching this class, I supervised more than 20 students in 2015 for their Masters projects. 
2015 was also a great year for my speaking and presenting experiences. Starting the year with presenting at IBM Interconnect (our premier cloud conference) and continued with presenting at more than 10 conferences. Silicon Valley Code camp was one of my favorites as it attracts students and developers all across bay area and also the two day events are free to attend. Presenting at Oreilly Velocity conference was a privilege and I learnt a lot from listening to some of the best speakers in the industry. Sharing my own ideas and work on Predictive monitoring and analytics was a very enriching experience.  In addition to many Hackathons and coding camps, the year ended well with presenting at Insight conference. This is where we learn in great detail, what our enterprise customer wants from us. The very nature of hybrid cloud topology of their evolving businesses teaches us a lot. 
On career front, I am very thankful for the opportunities that were provided to me. 2015 was also rewarding as new title and responsibilities allowed me to understand myself more. Leading a big team, aligning the business goals of the organization to people's day job and fostering the culture where everyone is intrinsically motivated and passionate is fun but challenging. I believe in high performance culture and self driven team and being surrounded by highly smart people is something I was fortunate to have. I am also fortunate to have leaders and mentors at work who are as passionate as me and make our day job fun and challenging. 
On personal front, 2015 was an interesting year as well. My mom retired after teaching at high school in my home town in India for last 34 years. We all were relieved that she got break but I figured out she was sad leaving the community that she served for so long. I was able to spend time with my kids as much as I wanted in spite of the chaotic nature of work. I will never forget the poster on the door of my great mentor Bala Iyer's IBM office - "Good fathers and good men". I consider myself successful in seeding the engineering(making) concept in my daughter's brain. She was able to take a burning problem and come up on a technological solution in the form of her school science project and achieved good heights. She surprised me when she kept winning school, county, state and went up to national semi final. We were able to take a couple of major vacations in Europe and North east USA which were very memorable and relaxing.
Last but not least, ended the year with reading a book - Without their permission by Reddit founder Alexis Ohanian. I love this guy not just because he was able to make millions of dollars at very young age, but because he explained in clear terms how internet allows all of us to contribute to the society and bring the changes that we all want to see. While relaxing at home at the end of the year, I was fortunate to meet one of my home town friend and mentor after 17 years (even though he lives in Indiana, we somehow missed seeing each other all these years) He had a huge impact on my early stage career and I always admired him for his attitude and passion towards life. 
Everything happens at right time and we must be optimistic. I would like to end this post with wishing all my readers, friends and mentors a very resilient new year!

Wednesday, December 16, 2015

Spark use cases

Signup for free IBM Bluemix cloud and use Spark Service to try these out. These are excellent use cases for beginners. After that you can move on to developing complex algorithm and run on larger data sets and leverage the true distributed computing power of Spark. 
1. Use train wreck datasets (Links to an external site.) to allow user to query the accidents based on keywords such as Bicycle, pedestrian 
2. Use NHTSA data to allow user to query fatalities based on some keyword (Links to an external site.) 
3. Use virtual sensor app (Links to an external site.) to generate gps location and wind speed to predict fire spread 
4. Use Spark Streaming to learn network conditions in real time, use (Links to an external site.) app to generate network data 
5. Use spark streaming to get temperature and humidity data (generate using (Links to an external site.) ) and do real time optimization/alrt 
6. Use airquality data (Links to an external site.) and send alert based on my zipcode 
7. personalization of news briefs - use twitter trend to personalize what news I might like and read -
9. flu detection by analyzing social signals like tweets 
10. auto adjustment of game level complexity for player retention - Hint: you can simulate game player data using virtual sensor app and use spark to lower or upper the game level 
11. employee data analysis for employee retention (you can use glassdoor data for this prototype) 
12. fraud detection in user reviews (use this data: (Links to an external site.) ) 
13. detect security breaches or attacks in real time (simulate network breach using blacklisted IPs) 
15. Use FAA bird strike data and spark  to allow users to query most dangerous airports (Links to an external site.) 
16. Use home energy usage datasets (Links to an external site.) as training set to find out how your own home energy usage levels up. Hint: user will enter his own energy usage and app will provide comparative analysis. 
17. Use baby names datasets to come up on some cool analytic spark queries: (Links to an external site.) (Hint: user finds out how boy or girl names are trending since 2005) 
18. Use NIST vulnerability datasets to predict possible DDoS attack (Links to an external site.)
19. use water and sanitation data (Links to an external site.) and spark to bring analytics such as which countries is worst / which one is improving.. 
20. use children mortality data (Links to an external site.) to figure out what are the most important causes for under 5 yr old kids die.. 
21. Use 2012 presidential donation datasets to find out who donated to presidential candidates most and if there is any corelation.. (Links to an external site.) 
 22. Use Music reviews data (contact the owner for data link..) and use spark to find the top ten songs or something.. 
23. Use air quality daily data for a given geography (such as San Jose Sunnyvale) and use spark to generate anomaly in the data to important some health and safety questions (Links to an external site.)
24. Use medicare outpatient payment datasets (Links to an external site.) and use spark to find out interesting answers such as: which city had max payments claimed. Which provider had repeated claims etc.. 
25. Use popular baby names datasets (Links to an external site.) and spark to predict what will be the most popular male and female names for 2016 
26. Use interesting Genome and proteins data sets from (Links to an external site.) and use spark to calculate interesting facts (Hint: use clinVar datasets to find out all gene types related to conditions: "Breast-Ovarian cancer" 
27. Use housing affordability datasets with Spark to come up on good analytics - which city and zip code has good overall job opportunity and housing affordability for 30-40 year old (Links to an external site.) 
28. Use farmers market location datasets and spark to generate some interesting analytics (Links to an external site.)

29. Use Govt real estate asset datasets: (Links to an external site.) and spark to come up on some cool analytics such as how much money govt is spending on maintaining useless assets etc.. 
30. Use death cause datasets (Links to an external site.) and spark to answer health related questions 
32. Use air quality data (Links to an external site.) and bring the factors that are contributing in  a particular geo location. feel free to use map app 
 33. Use NASA datasets (for ex. (Links to an external site.) ) and use spark to come up on some interesting answers 
34. Use NY open data of your choice (Links to an external site.) and bring some good analytics 
 35. Use Austin Restaurant inspection report data and spark to get answer to critical consumer questions such as which cuisine or which area rest has more violations in past year etc... (Links to an external site.)
36. use openspending API and spark streaming (Links to an external site.) to find out which govt is spending taxpayers money efficiently 
38. Use transportation datasets and spark to answer some cool analytic questions (be creative) (Links to an external site.) 
 40. Use lobbying database from (Links to an external site.) (hint: get quarterly data , convert from xml to JSON or whatever spark context can read) and then design some analytic program to find out which state or which industry is lobbying heavily
41. Use SAT score datasets and spark to come up on some cool analytics (Links to an external site.) 
42.Use train wreck datasets with spark to figure out what are the 10 most dangerous places for accidents and why 

Monday, October 12, 2015

Where do I get public data sets and APIs for my project?

  41. Road fatality:
  42. Every campaign donation to a US federal candidate:
  43. Dallas Police Incident Report:
  44. LAPD Crime report:
  45. Chicago crime report:
  46. Medicare payments to every doctor:
  47. Congressional lobbyists:
  48. Everyone visited white house since 2009:
  49. NYPD - stop, question and frisk data:
  50. workplace fatality:
  51. CA SAT, ACT AP scores:
  52. CA school test scores:
  53. CA school demographies:
  54. Earthquake events:
  55. Payments made by companies to doctors:
  56. college scorecard data:
  57. Baby names:
  58. NIST vulnerability databases:
  59. Bird strike database:
  60. Energy disaggregation dataset:
  61. webclick data:
  62. Economic data: