Jump to content

URI fragment: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
→‎Proposals: moved "LiveURL" proposal to the end, as it's excessively obscure: Wikipedia cannot list all Greasemonkey userscript hacks
trimmed lead section
Line 1: Line 1:
In computer [[hypertext]], a '''fragment identifier''' is a short [[Character string (computer science)|string]] of [[character (computing)|character]]s that refers to a [[resource (computer science)|resource]] that is subordinate to another, primary resource. The primary resource is identified by a [[Uniform Resource Identifier]] (URI), and the fragment identifier points to the subordinate resource. Typically, the fragment identifier is appended to the [[Uniform Resource Locator]] (URL, a type of URI) for a [[hypertext]] document and is meant to identify a portion of that document.
In computer [[hypertext]], a '''fragment identifier''' is a short [[Character string (computer science)|string]] of [[character (computing)|character]]s that refers to a [[resource (computer science)|resource]] that is subordinate to another, primary resource. The primary resource is identified by a [[Uniform Resource Identifier]] (URI), and the fragment identifier points to the subordinate resource.


A fragment identifier is defined by RFC 3986 as an optional component of a ''URI reference'', and it must conform to a certain syntax. The syntax requires that the fragment identifier be separated from the rest of the URI reference by a <code>#</code> ([[number sign]]) character. The separator is not considered part of the fragment identifier.
fragment identifier by a [[number sign]] separator is not considered part of the fragment identifier.


==Basics==
==Basics==

Revision as of 07:42, 13 June 2011

In computer hypertext, a fragment identifier is a short string of characters that refers to a resource that is subordinate to another, primary resource. The primary resource is identified by a Uniform Resource Identifier (URI), and the fragment identifier points to the subordinate resource.

The fragment identifier introduced by a hash mark # is the optional last part of an URL for a document and is typically meant to identify a portion of that document. The generic syntax is specified in RFC 3986. The hash mark separator is not considered part of the fragment identifier.

Basics

In URIs a hashmark # introduces the optional fragment near the end of the URL. The generic RFC 3986 syntax for URIs also allows an optional query part introduced by a question mark ?. In URIs with a query and a fragment the fragment follows the query. Query parts depend on the URI scheme, e.g., http: supports queries unlike ftp:, and are evaluated by the server. Fragments depend on the document MIME type, and are evaluated by the client (Web-browser). Clients are not supposed to send URI-fragments to servers when they retrieve a document, and without help from a local application (see below) fragments do not participate in redirections.

An URI ending with # is permitted by the generic syntax, this could be considered as a kind of empty fragment. In MIME document types such as text/html or any XML type empty identifiers to match this syntactically legal construct are not permitted. Web browsers typically display the top of the document for an empty fragment.

The fragment identifier functions differently than the rest of the URI: namely, its processing is exclusively client-side with no participation from the server — of course the server typically helps to determine the MIME type, and the MIME type determines the processing of fragments. When an agent (such as a Web browser) requests a resource from a Web server, the agent sends the URI to the server, but does not send the fragment. Instead, the agent waits for the server to send the resource, and then the agent processes the resource according to the document type and fragment value.[1]

Examples

  • In URIs for MIME text/html pages such as http://www.example.org/foo.html#bar the fragment refers to the element with id="bar".
    • Graphical Web browsers typically position pages so that the top of the element identified by the fragment id is aligned with the top of the viewport; thus fragment identifiers are often used in tables of content and in permalinks.
    • The appearance of the identified element can be changed through the :target CSS pseudoclass; Wikipedia uses this to highlight the selected reference. Notably CSS display: block can be used to show content only if it is the target, and otherwise hidden by display: none.
    • The deprecated name attribute (allowed only for some elements) had a similar purpose in now obsolete browsers. If present name and id must be identical.
  • In all XML document types including XHTML fragments corresponding to an xml:id or similar id attributes follow the Name-syntax and begin with a letter, underscore, or colon. Notably they cannot begin with a digit or hyphen.[2]
    • xml:id is one of the few generic XML attributes, e.g., xml:lang, which can be used without explicitly declaring a namespace.[3] In XHTML id has to be used, because XHTML was specified before xml:id existed.
  • In XML applications, fragment identifiers in a certain syntax can be XPointers; for example, the fragment identifier in the URI http://www.example.org/foo.xml#xpointer(//Rube) refers to all XML elements named "Rube" in the document identified by the URI http://www.example.org/foo.xml. An XPointer processor, given that URI, would obtain a representation of the document (such as by requesting it from the Internet) and would return a representation of the document's "Rube" elements.
  • In RDF vocabularies, such as RDFS, OWL, or SKOS, fragment identifiers are used to identify resources in the same XML Namespace, but are not necessarily corresponding to a specific part of a document. For example http://www.w3.org/2004/02/skos/core#broader identifies the concept "broader" in SKOS Core vocabulary, but it does not refer to a specific part of the resource identified by http://www.w3.org/2004/02/skos/core, a complete RDF file in which semantics of this specific concept is declared, along with other concepts in the same vocabulary.
  • In URIs for MIME text/plain documents RFC 5147 specifies a fragment identifier for the character and line positions and ranges within the document using the keywords "char" and "line". Some popular browsers do not yet support RFC 5147.[4] The following example identifies lines 11 through 20 of a text document:
    • http://example.com/document.txt#line=10,20
  • In JavaScript, the fragment identifier of the current HTML or XHTML page can be accessed in the "hash" property location.hash — note that Javascript can be also used with other document types. With the rise of AJAX, some websites use fragment identifiers to emulate the back button behavior of browsers for page changes that do not require a reload, or to emulate subpages.
    • For example, Gmail uses a single URL for almost every interface – mail boxes, individual mails, search results, settings – the fragment is used to make these interfaces directly linkable.[5]
    • Adobe Flash websites can use the fragment part to inform the user about the state of the website or web application, and to facilitate deep linking, commonly with the help of the SWFAddress JavaScript library.
    • Other websites use the fragment part to pass some extra information to scripts running on them – for example, Google Video understands permalinks in the format of #01h25m30s to start playing at the specified position,[6] and YouTube uses similar code such as #t=3m25s.[7] A format of #t=10,20 for a section of media from 10 to 20 seconds is proposed in the Media Fragments URI 1.0 W3C Working Draft.[8]
  • In URIs for MIME application/pdf documents Adobe PDF viewers recognize a number of fragment identifiers.[9] For instance, a URL ending in .pdf#page=35 will cause Adobe Reader to open the PDF and scroll to page 35. Several other parameters are possible, including #nameddest= (similar to HTML anchors), #search="word1 word2", #zoom=, etc. Multiple parameters can be combined with ampersands.
    • http://example.org/doc.pdf#view=fitb&nameddest=Chapter3.

Proposals

Several proposals have been made for fragment identifiers for use with plain text documents (which cannot store anchor metadata), or to refer to locations within HTML documents in which the author has not used anchor tags:

  • As of 2011 the W3C Media Fragments URI 1.0 Working Draft is in second last call.[8]
  • Erik Wilde and Marcel Baschnagel of the ETH Zurich propose to extend RFC 5147 to also identify fragments in plain text documents using regular expressions, with the keyword "match".[10] They also describe a prototype implementation as an extension for the Firefox browser. For example, the following would find the case-insensitive text "RFC" anywhere in the document:
    • http://example.com/document.txt#match=[rR][fF][cC]
  • K. Yee of the Foresight Institute proposes "extended fragment identifiers" delimited with colons and a keyword to differentiate them from anchor identifiers. A text search fragment identifier with "fragment specification scheme" id "words" is the first proposal in this scheme.[11] The following example would search a document for the first occurrence of the string "some context for a search term" and then highlight the words "search term":
    • http://example.com/index.html#:words:some-context-for-a-(search-term)
  • The Python Package Index appends the MD5 hash of a file to the URL as a fragment identifier, so the integrity of the file can be checked automatically.[12]
    • http://pypi.python.org ... zodbbrowser-0.3.1.tar.gz#md5=38dc89f294b24691d3f0d893ed3c119c
  • A hash-bang [13] fragment is a fragment starting with an exclamation mark !. An exclamation mark is illegal in HTML, XHTML, and XML identifiers, ensuring separation from that functionality.
    • Mozilla Foundation employee Gervase Markham has proposed a fragment identifier for searching, of the form #!s!search terms. Adding a number after the s (#!s10!) indicates that the browser should search for the nth occurrence of the search term. A negative number (#!s-3!) starts searching backwards from the end of the document. A Greasemonkey script is available to add this functionality to compatible browsers.[14]
      • http://example.com/index.html#!s3!search terms
    • Google Webmaster Central has proposed using an initial exclamation point in fragment identifiers for stateful AJAX pages[15]:
      • http://example.com/page?query#!state
  • The LiveURLs project proposed a fragment identifier format for referring to a region of text within a page, of the form #FWS+C, where F is the length of the first word (up to five characters), W is the first word itself, S is the length of the selected text and C is a 32-bit CRC of the selected text.[16] They implemented a variant of this scheme as an extension for the Firefox browser,[17] using the form #LFWS+C, where L is the length of the fragment itself, in two hex digits. Linking to the word "Fragment" using the implemented variant would yield:
    • http://example.com/index.html#115Fragm8+-52f89c4c

References

  1. ^ "Representation types and fragment identifier semantics". Architecture of the World Wide Web, Volume One. W3C. 2004.
  2. ^ "Validity constraint: ID". XML 1.0 (Fifth Edition). W3C. 2008.
  3. ^ "xml:id Version 1.0". W3C. 2005.
  4. ^ "Issue 77024". Chromium. 2011.
  5. ^ Link to Specific Content in Gmail, Google Blogoscoped, November 17, 2007
  6. ^ New Feature: Link within a Video, Official Google Video Blog, July 19, 2006
  7. ^ a b "Media Fragments URI 1.0". W3C working draft. 2011.
  8. ^ PDF Open Parameters - Specifying PDF Open Parameters in a URL
  9. ^ Fragment identifiers for plain text files, Erik Wilde and Marcel Baschnagel, Swiss Federal Institute of Technology (ETH Zürich), Proceedings of the sixteenth ACM conference on Hypertext and hypermedia doi:10.1145/1083356.1083398
  10. ^ Text-Search Fragment Identifiers, K. Yee, Network Working Group, Foresight Institute, March 1998
  11. ^ Pypi md5 check support - "Pypi has the habit to append an md5 fragment to its egg urls, we ll use it to check the already present distribution files in the cache."
  12. ^ "Hash URIs". W3C Blog. 2011-05-12.
  13. ^ Fragment Search, gerv.net
  14. ^ Proposal for making AJAX crawlable, Official Google Webmaster Central Blog, 2009-10-07
  15. ^ The technology behind LiveURLs, accessed 2011-03-13
  16. ^ "Web Marker" Firefox add-on, accessed 2011-03-13
  • Architecture of the World Wide Web, Volume One 3.2.1. Representation types and fragment identifier semantics
  • RFC 2396 4.1. Fragment Identifier (obsolete)
  • RFC 3986 3.5. Fragment
  • W3C Media Fragments Working Group, establishing a URI syntax and semantics to address media fragments in audiovisual material (such as a region in an image or a sub-clip of a video)