Home Buy Smartphone with Data Science
Post
Cancel

Buy Smartphone with Data Science

This post has the main objective to sharing my experience with a business case, where i pretend tell how i buyed a new smartphone using the data science and strategic sourcing techniques.

In jan/2021 i needed buy a new smartphone, then I thinking with me: Why not using data science to develop a machine learning model to help me?

This project is divide in 5 parts:

  1. Getting the data by web scraping;
  2. Cleaning the data;
  3. Start an exploratory analysis in the data;
  4. Setting the “weights” and create a model to help filter some devices;
  5. Conclusion: choosing “the best” device.

1. Getting the data by web scraping

Reading the main URL to obtain data about html page and then start web scraping process:

1
2
3
aparelhos<- tibble::tibble(url = unique(paste('https://www.tudocelular.com', rvest::read_html('https://www.tudocelular.com/celulares/fichas-tecnicas.html') %>% rvest::html_elements(xpath = '//*[@id="cellphones_list"]/article') %>% rvest::html_nodes('a') %>% rvest::html_attr('href'),sep = ''))) %>%
  dplyr::filter(stringr::str_detect(string = url, pattern = 'ficha')) %>% dplyr::mutate(url = as.character(url))
slice_head(aparelhos,n = 5)
1
2
3
4
5
6
7
8
## # A tibble: 5 × 1
##   url                                                                           
##   <chr>                                                                         
## 1 https://www.tudocelular.com/vivo/fichas-tecnicas/n8042/vivo-V25e.html         
## 2 https://www.tudocelular.com/vivo/fichas-tecnicas/n8106/vivo-Y16.html          
## 3 https://www.tudocelular.com/vivo/fichas-tecnicas/n8100/vivo-iQOO-Z6x.html     
## 4 https://www.tudocelular.com/vivo/fichas-tecnicas/n8091/vivo-iQOO-Z6-China.html
## 5 https://www.tudocelular.com/Redmi/fichas-tecnicas/n8097/Redmi-Note-11-SE-ndia…

Developing of function to do web scraping and then obtain the data about each device:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
fun_to_get_info <- function(phone) {
  
  url_link<- rvest::read_html(as.character(phone))
  
  telefone <-tibble::tibble(
    nomes = url_link %>% rvest::html_elements(xpath = '//*[@id="controles_titles"]') %>% rvest::html_nodes('li') %>% rvest::html_text() %>% readr::parse_character(),
    atributos = ifelse(
      !is.na(url_link %>% rvest::html_elements(xpath = '//*[@id="phone_columns"]') %>% rvest::html_nodes('li') %>% rvest::html_text() %>% readr::parse_character()),
      url_link %>% rvest::html_elements(xpath = '//*[@id="phone_columns"]') %>% rvest::html_nodes('li') %>% rvest::html_text() %>% readr::parse_character(),
      url_link %>% rvest::html_elements(xpath = '//*[@id="phone_columns"]') %>% rvest::html_nodes('li') %>% rvest::html_node('i') %>% rvest::html_attr('class') %>% readr::parse_character()
    )
  ) %>% 
    dplyr::mutate('Nome do Aparelho' = url_link %>% rvest::html_elements(xpath = '//*[@id="fwide_column"]/h2') %>% rvest::html_text())
  return(telefone)
}

Apply function to do download the data and presentation on the beautiful table:

1
2
3
4
5
htop<- purrr::map_dfr(.x = aparelhos$url, .f = fun_to_get_info) %>% 
  tidyr::pivot_wider(names_from = 'nomes', values_from = atributos, values_fn = list) %>% janitor::clean_names() %>% as_tibble()

knitr::kable(x = slice_head(htop,n = 5),
             caption = "The top Devices")
nome_do_aparelhosistema_operacionaldisponibilidadedimensoespesohardwaretelacameradesempenhosim_carddual_simgsmhspaltevelocidade_maxima_de_downloadvelocidade_maxima_de_uploadprocessadorchipsetx64_bitgpurammemoria_maxmemoria_expansivelpolegadasresolucaodensidade_de_pixelstipofpscoresmegapixelaperture_sizeestabilizacaoautofocofoco_por_toqueflashhdrdual_shotlocalizacaodeteccao_facialcamera_frontalresolucao_da_gravacaoauto_focagem_de_videoslow_motiondual_recopcoes_da_camera_frontalwi_fibluetoothusbnfcgpsacelerometroproximidadegiroscopiobussolaimpressao_digitaltvvibracaoviva_vozoutrosampereradio_fmx5gfps_da_gravacaovideo_camera_frontalautonomia_conversacaoautonomia_em_standbyestabilizacao_de_videoprotecaotamanho_do_sensorangulo_maximofoto_em_videoirdastereo_sound_recbarometromic_de_reducao_de_ruidosar_eumelhor_precopreco_extrafaixa_de_precocusto_beneficiovideo_hdrdeteccao_de_sorrisosegundo_displayzoom_oticofoto_3dgestoresistencia_a_agua
vivo V25eAndroid 12 Funtouch 122022/3159.2 x 74.2 x 7.79 mm183 gramas6.6 / 108.5 / 107.7 / 105 / 10NanoDual stand-byQuad Band (850/900/1800/1900)okok390 Mbps150 Mbps2x 2.2 GHz Cortex-A76 + 6x 2.0 GHz Cortex-A55Helio G99 MediaTekokMali-G57 MC28 GB256 GBSlot híbrido SIM/MicroSD MicroSDXC atè 1024 GB6.441080 x 2404 pixel, 9238 x 6928 pixel409 ppiAMOLED, LiPo60 Hz16 milhões64 Mp + 2 Mp + 2 MpF 1.79 + F 2.4 + F 2.4ÓticaokokLEDokokokok32 Mp F 2wrongokokokFace Detection802.11 a/b/g/n/ac5.2 com A2DP/LEType-C 2.0wrongA-GPS/GLONASS/BeiDou/Galileo/QZSSokokokokokwrongokokWi-Fi DirectWi-Fi hotspotUSB OTG4500 mAhNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL
vivo Y16Android 12 Funtouch 122022/3163.95 x 75.55 x 8.19 mm183 gramas5.3 / 105.3 / 105.7 / 104.7 / 10NanoDual stand-byQuad Band (850/900/1800/1900)okok300 Mbps150 Mbps4x 2.3 GHz Cortex-A53 + 4x 1.8 GHz Cortex-A53Helio P35 MediaTek MT6765okPowerVR GE83203 GB32 GBMicroSDXC6.51720 x 1600 pixel , 4163 x 3122 pixel270 ppiIPS LCD, LiPo60 Hz16 milhões13 Mp + 2 MpF 2.2 + F 2.4DigitalokokLEDNULLNULLokok5 Mp F 2.2wrongokNULLNULLNULL802.11 a/b/g/n/ac5.0 com A2DP/LEType-C 2.0wrongA-GPS/GLONASS/BeiDou/GalileookokokokokwrongokokWi-Fi hotspotUSB OTG5000 mAhokNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL
vivo iQOO Z6xAndroid 11 OriginOS Ocean2022/3163.87 x 75.33 x 9.27 mm204 gramas7.8 / 108.4 / 107.6 / 105 / 10NanoDual stand-byQuad Band (850/900/1800/1900)okok2770 Mbps-2x 2.4 GHz Cortex-A76 + 6x 2.0 GHz Cortex-A55Dimensity 810 MediaTekokMali-G57 MC26 GB128 GBwrong6.581080 x 2408 pixel, 8165 x 6124 pixel401 ppiIPS LCD, LiPo60 Hz16 milhões50 Mp + 2 MpF 1.8 + F 2.4DigitalokokDual LEDokokokok8 Mp F 2Full HDokNULLokFace Detection802.11 a/b/g/n/ac5.1 com A2DP/LE/aptX HDType-C 2.0wrongA-GPS/GLONASS/BeiDou/Galileo/QZSSokokwrongokokwrongokokWi-Fi DirectWi-Fi hotspotUSB OTG6000 mAhwrongok30 fpsFull HD, 30fps1080 minutos849 horasNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL
vivo iQOO Z6 (China)Android 12 OriginOS Ocean2022/3164.17 x 75.8 x 8.59 mm194.6 gramas7.3 / 108.4 / 108.3 / 106.9 / 10NanoDual stand-byQuad Band (850/900/1800/1900)okok3700 Mbps1600 Mbps1x 2.5 GHz Cortex-A78 + 3x 2.4 GHz Cortex-A78 + 4x 1.8 GHz Cortex-A55Snapdragon 778G Plus Qualcomm SM7325-AEokAdreno 642L8 GB256 GBwrong6.641080 x 2388 pixel, 9238 x 6928 pixel395 ppiIPS LCD, LiPo120 Hz16 milhões64 Mp + 2 Mp + 2 MpF 1.79 + F 2.4 + F 2.4ÓticaokokDual LEDokokokok8 Mp F 24K (2160p)okokokFace Detection802.11 a/b/g/n/ac/65.2 com A2DP/LE/aptX HDType-C 2.0okA-GPS/GLONASS/BeiDou/Galileo/QZSSokokokokokwrongokokWi-Fi DirectWi-Fi hotspotUSB OTG4500 mAhwrongok30 fpsFull HD, 30fps780 minutos429 horasokNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL
Redmi Note 11 SE (Índia)Android 11 MIUI 12.52022/3160.46 x 74.5 x 8.29 mm178.8 gramas7.8 / 108.5 / 108.1 / 104.8 / 10NanoDual stand-byQuad Band (850/900/1800/1900)okok600 Mbps150 Mbps2x 2.05 GHz Cortex-A76 + 6x 2.0 GHz Cortex-A55Helio G95 MediaTekokMali-G76 MC46 GB128 GBMicroSDXC6.431080 x 2400 pixel, 9238 x 6928 pixel409 ppiSuper AMOLED, LiPo60 Hz16 milhões64 Mp + 8 Mp + 2 Mp + 2 MpF 1.9 + F 2.2 + F 2.4 + F 2.4DigitalokokLEDokNULLokok13 Mp F 2.454K (2160p)okokNULLFace Detection802.11 a/b/g/n/ac5.0 com A2DP/LEType-C 2.0okA-GPS/GLONASS/BeiDou/GalileookokokokokwrongokokWi-Fi DirectWi-Fi hotspot5000 mAhokNULL30 fpsFull HD, 30fpsNULLNULLNULLGorilla Glass 31/1.97 “ + 1/4.0 “118 °okokNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULLNULL

The top Devices

2. Cleaning the data

1
2
3
teste<-htop %>% keep( ~!is.null(.))

teste<-htop %>% filter(complete.cases(across(.cols = nome_do_aparelho:estabilizacao_de_video,.fns = ~. == 'NULL')))

3. Start an exploratory analysis in the data

4. Setting the “weights” and create a model to help filter some devices

5. Conclusion: choosing “the best” device

This post is licensed under CC BY 4.0 by the author.