Class URL represents a Uniform Resource Locator, a pointer to a "resource" on the World Wide Web. A resource can be something as simple as a file or a directory, or it can be a reference to a more complicated object, such as a query to a database or to a search engine. More information on the types of URLs and their formats can be found at:

Types of URL

In general, a URL can be broken into several parts. Consider the following example:

  • http://www.example.com/docs/resource1.html
    

The URL above indicates that the protocol to use is http (HyperText Transfer Protocol) and that the information resides on a host machine named www.example.com. The information on that host machine is named /docs/resource1.html. The exact meaning of this name on the host machine is both protocol dependent and host dependent. The information normally resides in a file, but it could be generated on the fly. This component of the URL is called the path component.

A URL can optionally specify a "port", which is the port number to which the TCP connection is made on the remote host machine. If the port is not specified, the default port for the protocol is used instead. For example, the default port for http is 80. An alternative port could be specified as:

  • http://www.example.com:1080/docs/resource1.html
    

The syntax of URL is defined by RFC 2396: Uniform Resource Identifiers (URI): Generic Syntax, amended by RFC 2732: Format for Literal IPv6 Addresses in URLs. The Literal IPv6 address format also supports scope_ids. The syntax and usage of scope_ids is described here.

A URL may have appended to it a "fragment", also known as a "ref" or a "reference". The fragment is indicated by the sharp sign character "#" followed by more characters. For example,

  • http://java.sun.com/index.html#chapter1
    

This fragment is not technically part of the URL. Rather, it indicates that after the specified resource is retrieved, the application is specifically interested in that part of the document that has the tag chapter1 attached to it. The meaning of a tag is resource specific.

An application can also specify a "relative URL", which contains only enough information to reach the resource relative to another URL. Relative URLs are frequently used within HTML pages. For example, if the contents of the URL:

  • http://java.sun.com/index.html
    

contained within it the relative URL:

  • FAQ.html
    

it would be a shorthand for:

  • http://java.sun.com/FAQ.html
    

The relative URL need not specify all the components of a URL. If the protocol, host name, or port number is missing, the value is inherited from the fully specified URL. The file component must be specified. The optional fragment is not inherited.

The URL class does not itself encode or decode any URL components according to the escaping mechanism defined in RFC2396. It is the responsibility of the caller to encode any fields, which need to be escaped prior to calling URL, and also to decode any escaped fields, that are returned from URL. Furthermore, because URL has no knowledge of URL escaping, it does not recognise equivalence between the encoded or decoded form of the same URL. For example, the two URLs: http://foo.com/hello world/ and http://foo.com/hello%20world would be considered not equal to each other.

Note, the java.net.URI class does perform escaping of its component fields in certain circumstances. The recommended way to manage the encoding and decoding of URLs is to use java.net.URI, and to convert between these two classes using #toURI() and URI#toURL().

The URLEncoder and URLDecoder classes can also be used, but only for HTML form encoding, which is not the same as the encoding scheme defined in RFC2396.

Author

James Gosling

Since

1.0

Constructors

Methods

  • Gets the contents of this URL. This method is a shorthand for: * openConnection().getContent()

    Returns any

    the contents of this URL.

    Exception

    IOException if an I/O exception occurs.

    See

    java.net.URLConnection#getContent()

  • Gets the default port number of the protocol associated with this URL. If the URL scheme or the URLStreamHandler for the URL do not define a default port number, then -1 is returned.

    Returns number

    the port number

    Since

    1.4

  • Gets the file name of this URL. The returned file portion will be the same as getPath(), plus the concatenation of the value of getQuery(), if any. If there is no query portion, this method and getPath() will return identical results.

    Returns string

    the file name of this URL, or an empty string if one does not exist

  • Gets the host name of this URL, if applicable. The format of the host conforms to RFC 2732, i.e. for a literal IPv6 address, this method will return the IPv6 address enclosed in square brackets ('[' and ']').

    Returns string

    the host name of this URL.

  • Gets the anchor (also known as the "reference") of this URL.

    Returns string

    the anchor (also known as the "reference") of this URL, or null if one does not exist