About the Active DNS dataset

The Data

The dataset comes in Avro format which can be processed by several tools like avro-tools or fastavro for Python. We are working on a JSON version of the data for applications that do not rely on Java or Hadoop. Please contact us for more information.

This is an example of a record from the dataset.

{
    "date": "20161001",
    "qname": "gatech.edu.",
    "qtype": 1,
    "rdata": "130.207.160.173",
    "ttl": 300,
    "authority_ips": "128.61.244.253,168.24.2.35",
    "count": 80,
    "hours": 16710647,
    "source": "gt",
    "sensor": "active-dns"
}

You can download a small sample of the dataset by clicking here active_dns_sample_20161001.json.gz (418KB).


Interpreting the Data Fields

The field names in the dataset, are mostly self explanatory and are inline with RFC 1035. For simplicity, this list captures the definition of each field.
  • date: The date of the current Resource Record (RR).
  • qname: The query name that our recursive resolvers answered; effectively the domain name.
  • qtype: The question type number. A comprehensive list of qtypes can be found on Wikipedia.
  • rdata: The data returned by the Authoritative Nameserver(s).
  • ttl: The Time To Live for the particular Resource Record (RR).
  • authority_ips: The IP addresses of the Authoritative Nameservers (ANS) that replied with this particular Resource Record (RR).
  • count: The number of times this Resource Record (RR) was encountered for the specific date.
  • hours: A 24-bit integer that encodes the time of day that this Resource Record (RR) was encountered. More information can be found here..
  • source: The source for the particular Resource Record (RR).
  • sensor: The sensor that recorded this Resource Record (RR).

The Hours Field

The hours field encodes the hours of the day, that a particular domain name has been queried, in a 24 bit integer. You can use the following Python function that will return a list of hours around the clock, to decode the information.

def parse_hours(hours):
    hours = str(bin(int(hours)))[::-1][:-2]
    hours = '0' * (24 - len(hours)) + hours
    return [i for i in xrange(0, 24) if int(hours[i - 1])]

For the previous JSON excerpt, the output would look like:

In [2]: parse_hours(16710647)
Out[2]: [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 16, 18, 19, 20, 21, 22, 23]

Contact Us

To request access to data, please contact access@activednsproject.org. For more information about the project, please contact:

E: info@activednsproject.org