(IEEE, 2013) Torres García, Luis Miguel; Magaña Lizarrondo, Eduardo; Izal Azcárate, Mikel; Morató Osés, Daniel; Automática y Computación; Automatika eta Konputazioa
The complexity of web traffic has grown in the past
years as websites evolve and new services are provided over the
HTTP protocol. When accessing a website, multiple connections
to different servers are opened and it is usually difficult to
distinguish which servers are related to which sites. However,
this information is useful from the perspective of security and
accounting and can also help to label web traffic and use it
as ground truth for traffic classification systems. In this paper
we present a method to discover server IP addresses related to
specific websites in a traffic trace. Our method uses NetFlow-type
records which makes it scalable and impervious to encryption of
packet payloads. It is, moreover, popularity-aware in the sense
that it takes into consideration the differences in the number of
accesses to each site in order to provide a better identification
of servers. The method can be used to gather data from a group
of interesting websites or, by applying it to a representative set
of websites, it can label a sizeable number of connections in a
packet trace.