At Verisign Labs, research is not just for the sake of exploration, but to develop technologies that will play a significant role in the evolution of the Internet. Our research spans a wide range of technical disciplines and our researchers collaborate closely with engineering, platform developers, data architects and operations experts. Verisign Labs initiatives are deeply embedded in Verisign's business areas. A selection of our ongoing research projects is listed here, along with the publications of individual Verisign Labs researchers.
URLs often utilize query strings (i.e., key-value pairs appended to the URL path) to pass session parameters and form data. While these arguments are often benign and opaque, they sometimes contain tracking mechanisms, user demographics, and other privacy sensitive information. In isolation such URLs are not particularly problematic, but our Web 2.0 information sharing culture means these URLs are increasingly being broadcast in public forums.
Our research has examined nearly 900 million user-submitted URLs to gauge the prevalence and severity of such privacy leaks. We found troves of sensitive data, including 1.7 million email addresses, over 10 million fields of personal information, and several cases where usernames and passwords were passed in unencrypted plain-text. With this as motivation, we propose the development of a privacy-aware URL sanitization service. The goal of this service would be to transform input addresses by stripping non-essential key-value pairs and/or notifying users when sensitive data is critical to proper page rendering.
In recent years, researchers relied heavily on labels provided by antimalware companies to identify malware samples and families, to train and test their detection, classification and clustering algorithms. Furthermore, companies use those labels for guiding their mitigation and disinfection efforts. However, ironically, there is no prior systematic work on validating performance of the reliability of those labels (or even detections) and how they affect the said applications. Equipped with many malware samples of several families that are manually inspected and labeled, in this project we pose the following questions. How do different antivirus scans (AV) perform on them relative to the manual inspections? How correct are the labels given by those scans? How consistent are AV scans among one another? This research has yielded many negative (and perhaps scary) answers to these questions. An invitation is given to the community to challenge assumptions about relying on AV scans and labels as a ground truth for malware analysis and classification.
Verisign is collaborating with Georgia Tech to study how sudden changes in the domain name lookup patterns for various DNS mail exchange records may help identify spammers early in the lifecycle of a spam campaign.
During the course of execution malicious software tends to communicate with multiple HTTP endpoints. This sequence of communications, the aggregate set of endpoints and even individual addresses can fingerprint malware. Such signature-based detection has proven effective, but its scope is limited by the need for manual verification. If all such flows were blacklisted or flagged without inspection benign or mixed-use addresses would generate false alarms or disrupt legitimate traffic. Intending to reduce human labor requirements and minimize reporting latency we propose the autonomous classification of endpoints with metadata analysis. Beginning with a set of ~100k sandboxed malware executions, ~220k outbound HTTP domains/URLs were identified. Initial feature extraction has focused on endpoints in isolation, identifying shallow metadata and Whois properties (e.g., time of domain registration, TLD, etc.). Subsequent work intends to leverage short-term sequential and long-term historical patterns. Preliminary models trained atop manually labeled instances suggest promising outcomes. Not only are binary assessments accurate, but it may be feasible to predict threat severity (e.g., adware vs. corporate espionage) from this concise data.
Verisign Labs is collaborating with St. Andrews University by supporting an open source implementation of the DNS service that supports ILNP and by exploring the implications of a future routing system in which the DNS is involved not only in going from name to IP address, but in going from an identifying IP address to an routable (locating) IP address. ILNP has recently been published as Experimental RFCs 6740-6748, from the Internet Research Task Force (IRTF).
Verisign Labs is collaborating with UCLA to develop and explore the potential roles of DNS, as the largest operational collision-free Internet namespace, in the NDN future Internet architecture. In the NDN architecture, as in some other information-centric networking (ICN) designs, there is no Internet protocol (IP) and there are no IP addresses. Instead all traffic is routed and transported using requests from data consumers called Interests and packets of content called Data. Instead of Internet addresses, there are only names of content. Cryptographic signatures on the content enable arbitrary nodes to provide a copy of the content if a matching Interest packet arrives. In the light of this architecture, UCLA and Verisign Labs have designed and prototyped NDNS, a version of DNS using the existing namespace but following NDN design. We are deriving insights about the evolution of the Internet's DNS from these explorations and also helping to flesh out an innovative, successful future design that challenges present assumptions and offers novel capabilities and security properties. Due to the ubiquitous cryptographic signatures, trust management is vital to NDN. In our collaboration, we are exploring the new requirements that NDN places on trust management. Besides seeking effective new trust management architecture for NDN, this research allows us to seek a new viewpoint on the role sof DNS in trust management, current and potential.
As a new networking paradigm, the information-centric network (ICN) has a great potential in revolutionizing the way contents are delivered to users by bringing them closer in caches deployed in the network. In ICN architectures such as named data network (NDN), contents are fetched by their names from caches deployed in the network or from origin servers, servers that serve the contents if they are not cached in the network. In such ICN architecture, once a content data packet is fetched from an origin server, it is replicated and cached in all routers along the routing and forward path, starting from the router that the connects user who issues the interest to the one that connects to the origin server of the ICN – thus allowing further interests to be fulfilled quickly. However, the way ICN caching works poses a great privacy risk: the time difference between responses for an interest of cached and un-cached content can be used as an indicator to infer whether or not a near-by user has previously requested the same content as that requested by an adversary. Furthermore, requesting contents by their names enables users profiling by their access patterns. To this end, this project proceeds in two directions. In the first direction, we study the extent to which the timing attack is applicable in ICN architectures and provides several solutions to strike a balance between their cost and benefits and raise the bar for the adversary to apply such attacks. In the second direction, we explore the use of assisting tools and network components to enable privacy by providing a weaker binding between contents and their request names than their indicative all-times names.
We are working with researchers at University of Michigan to gain insight into the Internet’s ongoing transition from IPv4 to IPv6. IANA allocated the last /8s in 2011. The first RIRs exhausted their IPv4 space shortly thereafter. As a result, we have hypothesized that the scarcity of IPv4 addresses, as the result of this so-called "IPv4 exhaustion" will have profound effects on several of the desirable properties of the Internet. These impacted properties include, but are not limited to: support for heterogeneity and openness, security, scalability, reliability, availability, concurrency and transparency. In an effort understand the impact of scarcity on these desirable properties we plan to study those techniques and methodologies by which addresses are allocated and how these resources are subsequently used. While no fully formed scarcity models for IPv4 addresses exist, we do conjecture that several interesting phenomenon warrant study: rate of transition to IPv6, increased use of NAT-ing, finer grained routing, deallocation and block reclamation and market-based address allocation. For the sake of tractability, this proposal focuses on measuring the transition from IPv4 to the IPv6 space. We are specifically concerned with questions which shed light on adoption rates and eventual usage patterns in IPv6. While interesting from a modeling and characterization perspective, we also believe this work has significant impact on operations, assisting in uncovering inconsistencies as we transition and supporting capacity planning and optimization.
Current commodity hardware designs feature numerous cores running at a decreased frequency. The existing ecosystems of tools to unlock the performance potential of the hardware have created gaps. Software environments which address the gaps and unlock the breadth and depth of the hardware need to be created. This project investigates the AMP design alternative as a means to provide the highest possible performance from the hardware.
A new approach being standardized in the Internet Engineering Task Force (IETF) in which certification credentials are verified by DNSSEC-enabled zones, rather than the certification authority (CA) model used today. Our DANE work is aimed at understanding the extent of the “attack surface” for certificates when presented via a Web browser, versus being published in DNS using the new DANE protocol.
This study examines the behavior of current DNS resolver implementations including various versions of BIND, Unbound, PowerDNS, djbdns and Microsoft Windows 2008. In particular, we studied how recursive name servers choose among multiple authoritative servers for a given zone and their retransmission algorithms when under duress (i.e., packet loss and delay). We also simulated different networking conditions to see how different latencies can affect the resolver's server selection algorithm and impose simulated packet loss to understand the resolver's retransmit and back-off algorithms. These results may help make decisions about the right mix of anycast and unicast name servers.
The DNSSEC Debugger is a Web-based tool for ensuring that the "chain of trust" is intact for a particular DNSSEC enabled domain name. The tool shows a step-by-step validation of a given domain name and highlights any problems found. To use the tool, begin by visiting http://dnssec-debugger.verisignlabs.com and entering a domain name to be tested. The tool begins with a query to a root nameserver. It then follows the referrals to the authoritative nameserver, validating DNSSEC keys and signatures as it goes. Each step in the process is given a good (green), warning (yellow) or error (red) status code. You can move your mouse over the warning and error icons to view a longer explanation. Press the plus (+) and minus (-) keys to increase or decrease debugging. At the highest debugging level you can see the full, raw DNS messages for almost all of the queries. Here's some sample output from the tool for the whitehouse.gov domain:
The present-day DNS API and stub resolvers emerged from Unix some decades ago; therefore much of the power of the modern DNS is difficult or impossible to access in present-day platforms. Beginning in early 2012, a community process took place to develop a modern, application-friendly, extensible DNS API, in response to this situation. The result is the getdns API. The first requirement of application developers, who were the primary designers, was native support for asynchronous events. Additional consideration was given to easy API extensibility for new Resource Records, first-class support for the uses of SRV, EDNS, OPT and support for DNSSEC validation by applications – support that does not require the application developers and users to be DNS experts. The resulting API design is fully modernized and extensible and it should prove able to unleash both present and emerging capabilities of the DNS. In this research project, Verisign Labs along with several development engineers is collaborating with NLNET Labs (Amsterdam) to develop and promote an open source reference implementation of the API and the "stub resolver," the DNS support for DNS clients on a wide range of platforms.
A significant portion of registered domains are configured primarily to redirect their traffic to other domains. Despite the widespread application of domain redirection in the Internet, however, there have been few studies on domain redirection. Moreover, the few previous published works on redirection in the Internet have been very narrow in focus and studied very small, biased datasets. In this project, we perform the first large-scale comprehensive study of the use of domain redirection. First, we identify the main techniques for performing domain redirection. Using both DNS queries and using Web crawling, we identify how prevalent the use of each technique is on the Internet. We also perform analysis on a large sample of the detected domain redirects to uncover why domain redirects are used and look, in depth, at specific instances of each use. Finally, we look at the most prevalent forms of redirection and create the first Internet-scale snapshot of the domain redirect network.
Whois, a widely used service for querying databases that store information about domain names and IP addresses, is intended for enabling security and stability of the Internet by providing contact points to network operators and administrators. However, little is known about the real usage of Whois in today’s Internet. In this work, we provide the first longitudinal analysis of Whois queries for the .com and .net generic top-level domains. Our study characterizes billions of queries collected over four months, showing trends, volumes and evolution over time. Our measurements show several striking findings with many operational implications of great interest to the technical community. First, we find that more than two-third of the queries are not only for unregistered but also algorithmically generated domain names coming from only six IP addresses allocated to entities located in mainland China. By further investigation of the IP addresses, we find that they are virtual addresses of shared hosting services, with many compromised applications behind them generating a large number of queries over time. Using the characteristics of those queries and IP addresses, we extract features and use them to characterize “benign” traffic in the Whois service (issued from unknown IP addresses-- e.g., not associated with known registrars) and find out that only a small fraction of the Whois traffic we studied is serving a meaningful purpose. We conclude by providing several recommendations for creating reputation scores for Whois query issuers in to better utilize the infrastructure and minimize misuse.
Artem Dinaburg introduced the concept of "bitsquatting" (a neologism based on "typosquatting") in which domain names become changed due to errors in memory, storage, or data transmission.
In this paper, we examine DNS queries received at authoritative servers run by Verisign to look for evidence of bit-level errors.
Verisign is sponsoring research at the University of North Carolina on automated data analysis and automatic generation that uses distance metrics to represent the similarity between two values or objects.
In collaboration with researchers at Purdue University, we are investigating extensions to the current state-of-the-art intrusion detection systems by utilizing publicly available social behavior information of hackers.
The researchers will apply behavioral economic techniques to machine learning. Through the understanding of relevance, uniqueness and similarity in context in decisions about domain names, the research will help us build a cognitive map and quantitative representations of users' preferences and enhance our ability to analyze factors that influence consumer online purchasing behavior.
In collaboration with researchers at UCLA, we aim to understand the resiliency of DNS service as a whole by measuring the inter-dependency of different zones.
Such inter-dependency can be introduced by large numbers of authoritative DNS servers being placed at the same location (e.g. either in the same geographic area or in the same ISP network), or more commonly by the increased trend of DNS server outsourcing which has led to the concentration of DNS services of a large number of zones on a few DNS service providers. Consequently, a single failure can potentially bring down the DNS servers for a large number of domains.
Is it possible to develop blacklisting techniques for domain names used for malicious activities based on DNS query patterns?
We examine domain names that are known to be used for phishing attacks, spam and malware related activities to determine if they can be identified based on DNS query patterns. To date, we have found that malicious domain names tend to exhibit more variance in the networks that look up the domains and that these domains become popular faster after their initial registration time. We also noted that miscreant domains exhibit distinct clusters relating to the networks that look up these domains. The distinct spatial and temporal characteristics of these domains and their tendency to exhibit similar lookup behavior suggests that it may be possible to develop more effective and timely blacklisting techniques based on these differing lookup patterns.
A tool for visualizing the traffic patterns between DNS clients and servers, including sample data from the Root name servers.
The DNS client/server affinity visualization tool sheds new light on the complexities of DNS traffic. Within this OpenGL-based application, DNS clients are represented as dots of varying size and color. Servers are placed in three-dimensional space. Each time a client sends a DNS query to a particular server, it moves a little bit closer to that server. The size and color of a client is determined by its query rate.
The visualization is useful for understanding how clients behave when choosing among multiple authoritative name servers, such as the 13 root name servers. Many clients do not exhibit strong affinity and will not wander close to any particular server. Some clients, on the other hand, are clearly seen favoring a particular server.
The tool is equally useful for visualizing the behavior of BGP routing within an anycast cluster. The sample data for A-root on Feb 9, 2010 shows how clients migrate from one anycast node to another as routes are withdrawn and replaced over time.
The source code for the visualization tool is located on Verisign Labs Subversion server. This can be accessed via a Web browser or a Subversion client.
We are sponsoring research at Georgia Tech to identify new and advanced techniques to acquire and analyze actionable intelligence about malware.
This research targets the challenges that malware obfuscation tools and malware’s dependence on network access present to collecting useful information about malware. The researchers at Georgia Tech Information Security Center (GTISC) have developed a horizontally scalable, automated malware analysis system that leverages isolation, hardware virtualization and network analysis to better extract information about malware.
Verisign is supporting research at Georgia Tech to develop a large-scale Internet monitoring system.
It will provide a more sophisticated understanding of the role of the Internet’s infrastructure in facilitating botnet attacks such as spam, scam hosting and denial-of-service attacks. Bots have exploited various Internet protocols such as the Border Gateway Protocol (BGP) and the Domain Name System (DNS) to move from one portion of the Internet to another. This monitoring infrastructure will identifying key components of this underlying infrastructure, specifically autonomous systems that facilitate BGP agility and name servers and registrars that facilitate DNS agility. As a result, this system may provide cutting-edge intelligence for reputation systems for both DNS hosting infrastructure and autonomous systems.
Verisign is supporting research at Purdue University to identify online human behavior trends related to online identity management within social groups across social networking sites.
These efforts are focused on identifying current online behavior trends that work around the limitations of existing technologies to predict future trends for social networking technologies.
The 45nm to 32nm Die Shrink of the Intel Xeon product line “Westmere” introduces AES-NI SIMD class instructions, which can be used to greatly accelerate the performance of cryptographic operations.
The AES-NI "combinatorial logic" replaces the software-based table lookup of the FIPS 197 AES symmetric encryption standard. This project builds upon instructions AESENC, AESENCLAST, AESDEC, AESDECLAST, CLMUL, AESIMC and AESKEYGENASSIST to perform 10 (128 Bit), 12 (192 Bit) and 14 (256 Bit) rounds. The project continues to verify the durability of side channel attack protection and the ability to use the building blocks to accelerate Elliptic, ECHO, SHAVITE-3, etc. Additional design points include, but are not limited to: using pipelined combinatorial logic operations for other applications, and full-disk encryption and interoperability with other projects such as OpenSSL. If AES-NI introduces durable cryptographic performance within the network stall cycle of the computer, how can and should this change the consumer Internet experience? Can these instructions replace expensive cryptographic co-processor cards? This research will be re-conducted upon the introduction of the Intel “Sandy Bridge” AVX CPU to evaluate new hardware features implemented.
Significant advances in Graphics Processor Unit (GPU) technology may be leveraged by Verisign to enhance our services.
Although the newly introduced devices share the same name as their legacy counterparts, the number of threads and interconnected hardware structures has vastly improved along with the introduction of integer capability. What are the integer and floating point characteristics of the new units? Can they be introduced into highly available architectures? Can a client server ‘like’ computing model be successfully re-implemented using a GPU on a server? What are the characteristics of programming in OpenCL versus CUDA?
View the many universities and corporate labs with which we collaborate on various research projects.
Select and use a host of top DNS tools developed at Verisign Labs.
Meet the Verisign Labs team