A page with my data science projects.
I’m a physics teacher and a data scientist who loves open source programs and tools. I have master’s degree in natural science period during which I developed my research using Python for data analysis and data processing.
Since then, I have studied and, more recently, worked with python, data science and machine learning algorithms focused in business solutions. For more details about my projects and each solution, they are described in the Data Science Projects and Data Engineering Projects sections.
To help the booking of the Airbnb this data science project aim to create a machine learning model to predict the first booking of a new user. Unfortunately the database is very desbalanced which difficult the prediction of the model, the best result was 17.48% +/- 0.4% of accuracy. Therefore new approaches guided by the business will be necessary to improve the results.
To help the sales team, this data science project was created to sort a list to improve the cross-selling. The model was able to organize that almost all interested customers (98.31% +/- 0.16%) stay on up to 50% of the list, saving half of the expenses incurred for calls. So, if each call costs R$ 15.00 in 20,000.00 there is an expense of R$ 300,000.00. Using the model it is possible to spend only R$ 150,000.00.
Financial transactions fraud is one of the biggest problems faced by financial institutions. Thus, this project uses data science and machine learning to detect and avoid fraudulent transactions. The model got a precision of 96.3% +/- 0.7% and a recall of 76.3% +/- 3.5%. The profit expected by the company is R$ 57,251,574.44.
When a client churns, it represents a problem, which results in money loss for the company. In this project, I created a solution using data to predict such behavior and avoid it. The machine learning model was able to detect 76.5% of the client which could churn, by using unseen data as example. It represents a recovery of R$ 2,878,197.97 for the company.
Cardio Catch Disease is a company specialized in detecting heart diseases in early stages. For every 5% above 50% of prediction accuracy, there is an increase of 50% on the value charged per client. So, in this data science project, I created a model with a recognition rate of 71.8% +/- 0.5% and the estimated profit generated by using this model may be about R$ 11,285,500.00.
To ideate a new strategy of investments in for each sale store may be difficult. Therefore, to help the stack holders to make decisions about individual investments for each and every store in the chain, this data science project created a machine learning model able to predict the sales up to six weeks in advance. Hence, enabling them to calculate the profit per store and the amount of money available to invest.
The Bookclub doesn’t collect the data from its website, however they are updated with each purchase, sale or exchange that takes place on the website. For this purpose, this project aims to collect, transforma and load (ETL) data from the website books.toscrape for a SQLite database. The ETL is schenduled using Airflow using a Docker.