Posts by Tag

python

Build an API App backed by FastAPI and Vue.js

10 minute read

Presenting an API is never going to be attractive. In this post, I documented my approach of developing a web page on top of existing API using FastAPI + Vue.js technology stack.

Released a DataFrame summarytool for Jupyter Notebook

less than 1 minute read

Want to include a data summary as quick reference in your Jupyter notebooks ? I used to have summarytools package in R to do this. I miss that one when I’m doing python projects. So I developed a similar python function with some additional widgets. Please check out this post if you are interested.

Web Scraping of JavaScript website

2 minute read

In this post, I’m using selenium to demonstrate how to web scrape a JavaScript enabled page. If you had some experience of using python for web scraping, you probably already heard of beautifulsoup and urllib. By using the following code, we will be able to see the HTML and then use HTML tags to extract the desired elements. However, if the web page embedded with JavaScript, you will notice that some of the HTML elements can’t be seen from beautiful soup, because they are render by the JavaS...

Implement DeepFM model in Keras

8 minute read

Introduction Wide and deep architect has been proven as one of deep learning applications combining memorization and generatlization in areas such as search and recommendation. Google released its wide&deep learning in 2016. wide part: helps to memorize the past behaviour for specific choice deep part: embed into low dimension, help to discover new user, product combinations Later, on top of wide & deep learning, deepfm was developed combining DNN model and Factorization machi...

Digit Recognition with Tensor Flow

7 minute read

This time I am going to continue with the kaggle 101 level competition – digit recogniser with deep learning tool Tensor Flow. In the previous post, I used PCA and Pooling methods to reduce the dimensions of the dataset, and train with the linear SVM. Due to the limited efficiency of the R SVM package. I only sampled 500 records and performed a 10-fold cross validation. The resulting accuracy is about 82.7% 1. this time with tensorflow we can address the problem differently: Deep Lea...

Revisit Titanic Data using Apache Spark

5 minute read

This post is mainly to demonstrate the pyspark API (Spark 1.6.1), using Titanic dataset, which can be found here (train.csv, test.csv). Another post analysing the same dataset using R can be found here. Content Data Loading and Parsing Data Manipulation Feature Engineering Apply Spark ml/mllib models 1. data loading & parsing data loading sc is the SparkContext launched together with pyspark. Using sc.textFile, we can read csv file as text in RDD data format and data is sep...

Job Hunting Like A Data Analyst (Part III)

6 minute read

Continued with previous post – Explore the Job Market, this week I am going to develop a simple recommender system to find a suitable job . Recommender Let’s talk some background of recommendation system. A typical example of recommendation could be product recommended in the sidebar at Amazon or people you may know in Facebook. Usually we can categorised recommender into two types: 1. Content Based Recommendation: Content-based could mean user-based or product-based and the choice is de...

Job Hunting Like A Data Analyst (Part II)

4 minute read

Continued with previous post, I’ve added some additional lines of codes to fetch the job description of each job post. This will take a bit longer time, which is about (1.5 hour) for me, because I set a delay of ~10 seconds between each request. This week I will continue with overview picture of the job market of Data Analyst and develop a simple recommender based on skill and experience requirement. 0. Tools python 2.7 python package: pandas python package: re 1. Job Market Overv...

Back to top ↑

R

Introduction of renv package

2 minute read

R users have been complaining about the package version control for a long time. We admire python users, who can use simple commands to save and restore the packages with correct versions. The good news is that, RStudio recently introduced renv package to manage the local dependency and environment, filling the gap between R and python. renv resembles the conda / virtualenv concept in python.

Write Your Own R Packages

2 minute read

This post is to write my own util package to wrap all my udfs with a neat documentation.

Not so basic Keras tutorial for R

3 minute read

The basic tutorial of Keras for R is provided by keras here, which simple and fast to get started. But very soon, I realize this basic tutorial won’t meet my need any more, when I want to train larger dataset. And this is the tutorial I’m going to discuss about keras generators, callbacks and tensorboard. Keras Installation If you haven’t got your keras in R, just follow the steps at below: devtools::install_github("rstudio/keras") library(keras) install_keras() MNIST handwriting recogniti...

Shiny + shinydashboard + googleVis = Powerful Interactive Visiualization

4 minute read

If you are a data scientist, who spent several weeks on developing a fantanstic model, you’d like to have an equally awesome way to visualize and demo your results. For R users, ggplots are good option, but no longer sufficient. R-shiny + shinydashboard + googleVis could be a wonderful combination for a quick demo application. For the purpose of illustration, I just downloaded a random sample data test.csv from kaggle’s latest competitions: https://www.kaggle.com/c/new-york-city-taxi-fare-pre...

Implementation of Model Based Recommendation System in R

1 minute read

The most straight forward recommendation system are either user based CF (collaborative filtering) or item based CF, which are categorized as memory based methods. User-Based CF is to recommend products based on behaviour of similar users, and the Item-Based CF is to recommend similar products from products that user purchased. No matter which method is used, the user-user or item-item similarity matrix, which could be sizable, is required to compute. While on the contrast, a model based app...

Back to top ↑

keras

Implement DeepFM model in Keras

8 minute read

Introduction Wide and deep architect has been proven as one of deep learning applications combining memorization and generatlization in areas such as search and recommendation. Google released its wide&deep learning in 2016. wide part: helps to memorize the past behaviour for specific choice deep part: embed into low dimension, help to discover new user, product combinations Later, on top of wide & deep learning, deepfm was developed combining DNN model and Factorization machi...

Not so basic Keras tutorial for R

3 minute read

The basic tutorial of Keras for R is provided by keras here, which simple and fast to get started. But very soon, I realize this basic tutorial won’t meet my need any more, when I want to train larger dataset. And this is the tutorial I’m going to discuss about keras generators, callbacks and tensorboard. Keras Installation If you haven’t got your keras in R, just follow the steps at below: devtools::install_github("rstudio/keras") library(keras) install_keras() MNIST handwriting recogniti...

Back to top ↑

javascript

Deploy deep learning models in browser using Tensorflow.js

5 minute read

A brief guide on how to deploy deep learning model in browser using tensorflow.js.In this post, a mobileNet model was trained to predict BMI, Age and Gender. The model takes input (either from webcam or uploaded files) to make prediction from browser. This deployment has a obvious advantage of reduced uploading traffic compared to RESTful API approach.

Back to top ↑

titanic

Revisit Titanic Data using Apache Spark

5 minute read

This post is mainly to demonstrate the pyspark API (Spark 1.6.1), using Titanic dataset, which can be found here (train.csv, test.csv). Another post analysing the same dataset using R can be found here. Content Data Loading and Parsing Data Manipulation Feature Engineering Apply Spark ml/mllib models 1. data loading & parsing data loading sc is the SparkContext launched together with pyspark. Using sc.textFile, we can read csv file as text in RDD data format and data is sep...

Tree based models in R on Titanic Data

5 minute read

This is the first time I blog my journey of learning data science, which starts from the first kaggle competition I attempted - the Titanic. In this competition, we are asked to predict the survival of passengers onboard, with some information given, such as age, gender, ticket fare… Translated letter reveals first hand account of the “unforgettable scenes where horror mixed with sublime heroism” as the Titanic sank Photo: Getty Images How bad is this tragedy? Let’s take some exploratory d...

Back to top ↑

tensorflow

Deploy deep learning models in browser using Tensorflow.js

5 minute read

A brief guide on how to deploy deep learning model in browser using tensorflow.js.In this post, a mobileNet model was trained to predict BMI, Age and Gender. The model takes input (either from webcam or uploaded files) to make prediction from browser. This deployment has a obvious advantage of reduced uploading traffic compared to RESTful API approach.

Digit Recognition with Tensor Flow

7 minute read

This time I am going to continue with the kaggle 101 level competition – digit recogniser with deep learning tool Tensor Flow. In the previous post, I used PCA and Pooling methods to reduce the dimensions of the dataset, and train with the linear SVM. Due to the limited efficiency of the R SVM package. I only sampled 500 records and performed a 10-fold cross validation. The resulting accuracy is about 82.7% 1. this time with tensorflow we can address the problem differently: Deep Lea...

Back to top ↑

shiny

Shiny + shinydashboard + googleVis = Powerful Interactive Visiualization

4 minute read

If you are a data scientist, who spent several weeks on developing a fantanstic model, you’d like to have an equally awesome way to visualize and demo your results. For R users, ggplots are good option, but no longer sufficient. R-shiny + shinydashboard + googleVis could be a wonderful combination for a quick demo application. For the purpose of illustration, I just downloaded a random sample data test.csv from kaggle’s latest competitions: https://www.kaggle.com/c/new-york-city-taxi-fare-pre...

Back to top ↑

random forest

Tree based models in R on Titanic Data

5 minute read

This is the first time I blog my journey of learning data science, which starts from the first kaggle competition I attempted - the Titanic. In this competition, we are asked to predict the survival of passengers onboard, with some information given, such as age, gender, ticket fare… Translated letter reveals first hand account of the “unforgettable scenes where horror mixed with sublime heroism” as the Titanic sank Photo: Getty Images How bad is this tragedy? Let’s take some exploratory d...

Back to top ↑

decision tree

Tree based models in R on Titanic Data

5 minute read

This is the first time I blog my journey of learning data science, which starts from the first kaggle competition I attempted - the Titanic. In this competition, we are asked to predict the survival of passengers onboard, with some information given, such as age, gender, ticket fare… Translated letter reveals first hand account of the “unforgettable scenes where horror mixed with sublime heroism” as the Titanic sank Photo: Getty Images How bad is this tragedy? Let’s take some exploratory d...

Back to top ↑

SVM

Recognize the Digits

2 minute read

This time I am going to demostrate the kaggle 101 level competition - digit recogniser. We are asked to train a model to recogize the digit from the pixel data in this competition. The data set is available here. description of the data: label: the integers from 0 - 9; features: pixel001-pixel784, which are rolled out from 28x28 digit image; pixel data is ranged from 0 -255, which indicating the brightness of the pixel in grey scale; Visualize the digit: Let’s randomly look at 100 dig...

Back to top ↑

PCA

Recognize the Digits

2 minute read

This time I am going to demostrate the kaggle 101 level competition - digit recogniser. We are asked to train a model to recogize the digit from the pixel data in this competition. The data set is available here. description of the data: label: the integers from 0 - 9; features: pixel001-pixel784, which are rolled out from 28x28 digit image; pixel data is ranged from 0 -255, which indicating the brightness of the pixel in grey scale; Visualize the digit: Let’s randomly look at 100 dig...

Back to top ↑

tableau

Tableau Intersection Filter Tutorial

less than 1 minute read

If you used Tableau before, you will know that the filters in Tableau are union/or selection.Let’s take the table below for example. If you are going to create a filter and select product a & b, tableau will show client A,B,C and E instead of A,C. It’s because the filters will show us the list of clients who purchased product a or b, instead of product a and b. the idea Firstly, create a variable to count the selection of products. Then create another variable to count the selection...

Back to top ↑

spark

Revisit Titanic Data using Apache Spark

5 minute read

This post is mainly to demonstrate the pyspark API (Spark 1.6.1), using Titanic dataset, which can be found here (train.csv, test.csv). Another post analysing the same dataset using R can be found here. Content Data Loading and Parsing Data Manipulation Feature Engineering Apply Spark ml/mllib models 1. data loading & parsing data loading sc is the SparkContext launched together with pyspark. Using sc.textFile, we can read csv file as text in RDD data format and data is sep...

Back to top ↑

css

Modified readthedown RMarkdown template for stylish analytical documents

2 minute read

This is a modified readthedown rmarkdown template, which is greatly inspired and modified based on juba/rmdformats package. readthedown offer a similar sphnix style, which is commmonly used in various python package documentations. I personally very much like the readthedown style and hence dive a little bit on the source code to figure out ways to make it easier for further customization.

Back to top ↑

rmarkdown

Modified readthedown RMarkdown template for stylish analytical documents

2 minute read

This is a modified readthedown rmarkdown template, which is greatly inspired and modified based on juba/rmdformats package. readthedown offer a similar sphnix style, which is commmonly used in various python package documentations. I personally very much like the readthedown style and hence dive a little bit on the source code to figure out ways to make it easier for further customization.

Back to top ↑