Publication:
TBDClust: time-based density clustering to enable free browsing of sites in pay-per-use mobile Internet providers

Date

2017

Director

Publisher

Elsevier
Acceso abierto / Sarbide irekia
Artículo / Artikulua
Versión publicada / Argitaratu den bertsioa

Project identifier

MINECO//TEC2015-69417-C2-2-R/ES/recolecta

Abstract

The World Wide Web has evolved rapidly, incorporating new content types and becoming more dynamic. The contents from a website can be distributed between several servers, and as a consequence, web traffic has become increasingly complex. From a network traffic perspective, it can be difficult to ascertain which websites are being visited by a user, let alone which part of the user's traffic each website is responsible for. In this paper we present a method for identifying the TCP connections involved in the same full webpage download without the need of deep packet inspection. This identification is needed for example to enable free browsing of specific websites in a pay per use mobile Internet access. It could be not only for third party promoted websites but also portals to gubernamental or medical emergency websites. The proposal is based on a modification of the DBSCAN clustering algorithm to work online and over one-dimensional sorted data. In order to validate our results we use both real traffic and packet captures from a controlled environment. The proposal achieves excellent results in consistency (99%) and completeness (92%), meaning that its error margin identifying the webpage downloads is minimal.

Description

Keywords

Clustering TCP connections, Time-based density clustering, DBSCAN, Mobile web browsing, Online monitoring, Real traffic dataset

Department

Automatika eta Konputazioa / Institute of Smart Cities - ISC / Automática y Computación

Faculty/School

Degree

Doctorate program

item.page.cita

item.page.rights

© 2017 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license

Los documentos de Academica-e están protegidos por derechos de autor con todos los derechos reservados, a no ser que se indique lo contrario.