Difference between revisions of "Diacritic"

Revision as of 12:31, March 25, 2017

A diacritic is a mark near or through a character that changes its phonetic value or significance. For example, diacritics appear above the letter "e" in the word "résumé," distinguishing the noun from the verb "resume." Diacritics are more common in various European languages than they are in English.

The following are common diacritics:

Áá — An acute accent is a symbol placed over a vowel in some languages, especially French and Italian.
Àà — A grave accent is placed over a vowel in some languages, especially French and Italian.
Ââ — A circumflex is placed over a vowel in some languages, especially French.
Ää — A dieresis or umlaut, represented by two dots above the vowel, is used in various Germanic languages.
Ññ — A tilde is used in Spanish and Portuguese.
Åå — The ring is used in Scandinavian writing.
Øø — The slant is used in Danish and Norwegian.
Çç — A cedilla is used in French.

Sounds unique to Eastern European languages were once written with two letter combination called "digraphs." In De Ortographia Bohemica (1412), Jan Hus proposed the use of diacritics in place of digraphs. Eight-bit character encoding, introduced in the 1980s, allows dozens of characters with diacritics to be rendered on computer and transmitted electronically. Unicode was incorporated into Windows in 2000. It allows for an almost unlimited character set.

Correct usage

All the major style guides advise the writer to select a widely available reference work and to follow the spellings given in this work. Modern computer software allows dictionary and encyclopedia spellings to be reproduced exactly. Better known names, for example "Istanbul" or "Zurich," are often spelled without diacritics in English even though diacritics are part of the local language spelling. Lesser known names are generally spelled in the manner of the original language. Diacritics are not normally used for sports figures or for Vietnamese names. These are just rules of thumb, and each case should be checked separately in an appropriate reference work.

Merriam-Webster^[1]	American Heritage^[2]	Oxford^[3]	Webster’s New World^[4]	Random House^[5]	Encyclopedias
Merriam-Webster^[1]	American Heritage^[2]	Oxford^[3]	Webster’s New World^[4]	Random House^[5]	Britannica^[6]	Columbia^[7]
Be·neš, Edvard	Be·neš, Eduard	Beneš, Edvard	Beneš, Edvard	Be·neš, Ed·u·ard	Edvard Beneš	Eduard Beneš
Koś·ciusz·ko, Tadeusz Andrzei Bonawentura	Kos·ci·uśz·ko or Kos·ci·us·ko, Thaddeus	Kosciusko, Thaddeus	Kosciusko, Thaddeus	Kos·ci·us·ko, Thaddeus	Tadeusz Kościuszko	Thaddeus Kosciusko
Mit·ter·rand, François (-Maurice)	Mit·ter·rand, François Maurice	Mitterrand, François	Mitterrand, François (Maurice)	Mit·ter·rand, Fran·çois (Mau·rice Ma·rie)	François Mitterrand	François Maurice Mitterrand
Tō·jō Hideki	To·jo, Hideki	Tojo, Hideki	Tojo, Hideki	To·jo, Hi·de·ki	Tōjō Hideki	Tōjō Hideki
Vö·rös·marty, Mihály^[8]	N/A	N/A	N/A	N/A	Mihály Vörösmarty	Mihály Vörösmarty
Wa·łe·sa [sic.], Lech	Wa·łę·sa, Lech	Wałęsa, Lech	Wałęsa, Lech	Wa·łę·sa, Lech	Lech Wałęsa	Lech Wałęsa

The U.S. Board on Geographic Names sets U.S. government usage in geography. The “conventional” name is the name BGN deems suitable for English language usage. The “approved” name is the official name in the local language.

Merriam-Webster^[1]	American Heritage^[2]	Oxford^[3]	Webster’s New World^[4]	Random House^[5]	Encyclopedias		U.S. Board on Geographic Names^[9]
Merriam-Webster^[1]	American Heritage^[2]	Oxford^[3]	Webster’s New World^[4]	Random House^[5]	Britannica^[6]	Columbia^[7]	Conventional	Approved
Is·tan·bul	Is·tan·bul	Istanbul	Istanbul	Is·tan·bul	Istanbul	Istanbul	N/A	İstanbul
Jy·vas·ky·la	N/A	Jyväskylä	N/A	Jy·väs·ky·lä	Jyväskylä	Jyväskylä	N/A	Jyväskylä
Lü·beck	Lü·beck	Lübeck	Lü·beck	Lü·beck	Lübeck	Lübeck	N/A	Lübeck
Plo·iesti or Plo·esti	Plo·ieş·ti or Plo·eş·ti	Ploieşti	Plo·ieş•ti or Plo·eş·ti'	Plo·eş·ti	Ploieşti	Ploieşti	N/A	Ploiești
Zu·rich	Zu·rich	Zurich	Zu·rich	Zu·rich	Zürich	Zürich	N/A	Zürich
Vietnamese towns
Ho Chi Minh City	Ho Chi Minh City	Ho Chi Minh City	Ho Chi Minh City	Ho Chi Minh City	Ho Chi Minh City	Ho Chi Minh City	Ho Chi Minh City	Thành Phố Hồ Chí Minh
Ha·noi	Ha·noi	Hanoi	Hanoi	Ha·noi	Hanoi	Hanoi	N/A	Hà Nội
Hai·phong	Hai·phong	Haiphong	Haiphong	Hai·phong	Haiphong	Haiphong	N/A	Hải Phòng
Hue^[10]	Hue	Hué	Hue	Hué	Hue	Hue	N/A	Huế

Electronic encoding

Eight characters with diacritics are included in International Morse Code: Ä, Á, Å, Ch (a Czech digraph), É, Ñ, Ö, and Ü. This encoding method, which includes only capital letters, was developed by Friedrich Clemens Gerke in 1848 and was adopted as an international standard in 1865.

In the early 1900s, teletype displaced Morse code for most purposes. Teletype was encoded using Baudot. Baudot is a five-bit code developed in 1870 that includes only capital letters and has no diacritics. Baudot, in turn, was displaced by ASCII, a seven-bit code developed in 1963 that includes both upper and lower cased letters. IBM introduced Extended ASCII, an eight-bit encoding standard, with the original PC in 1981. This set includes 37 characters with diacritics. Latin-1, a slightly revised version of the IBM character set, was adopted as an international standard in 1987.^[11]

Unicode, implemented by the Windows operating system since 2000, includes Latin-1 as well as a comprehensive collection of Nordic, Eastern European, and even Asian characters. Unicode characters can be up to four bytes long. This allows for over 1.1 million characters to be encoded, although only 113,000 codepoints have been assigned so far.^[12]

Latin-1

This eight-bit character set covers Western European languages. It is a variation of IBM's "Extended ASCII" set. This set is often referred to as "ANSI." However, the standard approved by the American National Standards Institute is for an eight-bit character set, not this set specifically. The set includes the following diacritics:

The ligature: Ææ
The acute accent: Áá, Éé, Íí, Óó, Úú.
The grave accent: Àà, Èè, Ìì, Òò, Ùù.
The circumflex: Ââ, Êê, Îî, Ôô, Ûû.
The umlaut: Ää, Ëë, Ïï, Öö, Üü.
The tilde: Ãã, Ññ.
The ring: Åå.
The slant: Øø.
The cedilla: Çç.

In Unicode, the Latin-1 characters have codepoints from U+0000 to U+00FF.

Latin-2

Latin-2 is an eight-bit character set intended for use with Eastern European languages. It includes the following diacritics:

The ogonek: Ąą Ęę Ţţ.
The acute: Áá, Ćć, Éé, Íí, Ĺĺ, Ńń, Óó, Ŕŕ, Śś, Úú, Ýý, Źź.
The circumflex: Ââ, Îî, Ôô.
The breve: Ăă, Čč, Ďď, Ěě, Ňň, Řř, Šš, Ťť, Žž.
The vertical caron: Ľľ
The umlaut: Ää, Ëë, Öö, Üü.
The cedilla: Çç, Şş.
The stroke: Đđ, Łł.
The double acute: Őő, Űű.
The ring: Ůů.
The dot: Żż.
The s sharp: ẞß.

Turkish can be encoded as Latin-5, while the Nordic languages may be encoded as Latin-6. Since the shift to Unicode, the various eight-bit character sets have become less relevant.

References

↑ ^1.0 ^1.1 Merriam-Webster Dictionary
↑ ^2.0 ^2.1 American Heritage Dictionary of the English Language
↑ ^3.0 ^3.1 Oxford Dictionaries
↑ ^4.0 ^4.1 Webster’s New World College Dictionary
↑ ^5.0 ^5.1 Random House Dictionary
↑ ^6.0 ^6.1 Encyclopædia Britannica
↑ ^7.0 ^7.1 Columbia Encyclopedia
↑ This name is not given either online or in the Collegiate, but only in Merriam-Webster's Biographical Dictionary (1995).
↑ U.S. Board on Geographic Names
↑ Merriam-Webster's Geographical Dictionary (1997) gives "Hue or Hué." The variant has a French (not Vietnamese) diacritic over the e.
↑ Controls and Latin-1 Supplement, Unicode, Inc.
↑ Cunningham, Andrew, "Unicode 7.0 introduces 2,834 new characters, including 250 emoji", Ars Technica, June 17, 2014

External links

The following references may be consulted to determine proper spelling, including the correct use of diacritics:

American dictionaries

American Heritage Dictionary of the English Language. Recommended by The Chicago Manual of Style and the Modern Language Association.
Merriam-Webster Dictionary. Merriam-Webster spelling is the first choice of the all the major American style guides, including CMOS, The Gregg Reference Manual, MLA, and the American Psychological Association.
Random House Dictionary. Recommended by CMOS and MLA.
Webster's New World College Dictionary. Recommended by the Associated Press.

British dictionaries

Collins Dictionary
Oxford Dictionaries. Recommended by New Hart's Rules, the British equivalent to CMOS.

Encyclopedias

Sports

ESPN.com. ESPN Sports Almanac was a standard sports reference until it was discontinued in 2009. Much of the information that was formerly used for the almanac is available at this site.

@@ Line 1: / Line 1: @@
-A '''diacritic''' is a mark near or through a character that changes its phonetic value or significance. For example, diacritics appear above the letter "e" in the word "résumé," distinguishing the noun from the verb "resume." Diacritics are more common in various European languages than they are in English
+A '''diacritic''' is a mark near or through a character that changes its phonetic value or significance. For example, diacritics appear above the letter "e" in the word "résumé," distinguishing the noun from the verb "resume." Diacritics are more common in various European languages than they are in English.
-The following are some common diacritics:
+The following are common diacritics:
 * '''Áá''' — An acute accent is a symbol placed over a vowel in some languages, especially [[French]] and [[Italian]].
@@ Line 12: / Line 12: @@
 * '''Çç''' — A cedilla is used in French.
-Sounds unique to Eastern European languages were once written with two letter combination called "digraphs." In ''De Ortographia Bohemica'' (1412), Jan Hus proposed the use of diacritics in place of digraphs. Eight-bit character encoding, introduced in the 1980s, allows dozens of characters with diacritics to be rendered on computer and transmitted electronically. Unicode, the current encoding standard, allows for an almost unlimited character set.
+Sounds unique to Eastern European languages were once written with two letter combination called "digraphs." In ''De Ortographia Bohemica'' (1412), Jan Hus proposed the use of diacritics in place of digraphs. Eight-bit character encoding, introduced in the 1980s, allows dozens of characters with diacritics to be rendered on computer and transmitted electronically. Unicode was incorporated into Windows in 2000. It allows for an almost unlimited character set.
 ==Correct usage==
-All the major style guides advise the writer to select a widely available reference work and to follow the spellings given in this work. Modern computer software allows dictionary and encyclopedia spellings to be reproduced exactly. Guidance that suggests dropping off technically difficult diacritics may be disregarded as outdated. Better known names, for example "Istanbul" or "Zurich," are often spelled without diacritics in English even though diacritics are part of the local language spelling. Lesser known names are generally spelled in the manner of the original language. Diacritics are not normally used for sports figures or for Vietnamese names. These are just rules of thumb, and each case should be checked separately in an appropriate reference work.
+All the major style guides advise the writer to select a widely available reference work and to follow the spellings given in this work. Modern computer software allows dictionary and encyclopedia spellings to be reproduced exactly. Better known names, for example "Istanbul" or "Zurich," are often spelled without diacritics in English even though diacritics are part of the local language spelling. Lesser known names are generally spelled in the manner of the original language. Diacritics are not normally used for sports figures or for Vietnamese names. These are just rules of thumb, and each case should be checked separately in an appropriate reference work.
 {| class="wikitable" style="font-size: 90%;"
@@ Line 189: / Line 189: @@
 Eight characters with diacritics are included in International Morse Code: Ä, Á, Å, Ch (a Czech digraph), É, Ñ, Ö, and Ü. This encoding method, which includes only capital letters, was developed by Friedrich Clemens Gerke in 1848 and was adopted as an international standard in 1865.
-Teleprinters used Baudot, a five-bit code developed in 1870 that includes only capital letters without diacritics. In the 1960s, Baudot was replaced by ASCII, a seven-bit code that includes both upper and lower cased letters. IBM introduced Extended ASCII, an eight-bit encoding standard, with the original PC in 1981. This set includes 37 characters with diacritics. Latin-1, a slightly revised version of the IBM character set, was adopted as an international standard in 1987.<ref>[http://www.unicode.org/charts/PDF/U0080.pdf Controls and Latin-1 Supplement], Unicode, Inc.</ref>
+In the early 1900s, teletype displaced Morse code for most purposes. Teletype was encoded using Baudot. Baudot is a five-bit code developed in 1870 that includes only capital letters and has no diacritics. Baudot, in turn, was displaced by ASCII, a seven-bit code developed in 1963 that includes both upper and lower cased letters. IBM introduced Extended ASCII, an eight-bit encoding standard, with the original PC in 1981. This set includes 37 characters with diacritics. Latin-1, a slightly revised version of the IBM character set, was adopted as an international standard in 1987.<ref>[http://www.unicode.org/charts/PDF/U0080.pdf Controls and Latin-1 Supplement], Unicode, Inc.</ref>
-Unicode, implemented by the Windows operating system since 2000, includes Latin-1 as well as a comprehensive collection of Nordic, Eastern European, and even Asian characters. Unicode characters can be up to four bytes long. This allows for over 1.1 million characters to be encoded, although only 113,000 codepoints have been assigned so far.<ref>Cunningham, Andrew, "[http://arstechnica.com/gadgets/2014/06/unicode-7-0-introduces-2834-new-characters-including-250-emoji/ Unicode 7.0 introduces 2,834 new characters, including 250 emoji]", Ars Technica, June 17, 2014</ref>
+Unicode, implemented by the Windows operating system since 2000, includes Latin-1 as well as a comprehensive collection of Nordic, Eastern European, and even Asian characters. Unicode characters can be up to four bytes long. This allows for over 1.1 million characters to be encoded, although only 113,000 codepoints have been assigned so far.<ref>Cunningham, Andrew, "[http://arstechnica.com/gadgets/2014/06/unicode-7-0-introduces-2834-new-characters-including-250-emoji/ Unicode 7.0 introduces 2,834 new characters, including 250 emoji]", ''Ars Technica'', June 17, 2014</ref>
 ===Latin-1===
-The following are Western European (Latin-1) diacritics:
+This eight-bit character set covers Western European languages. It is a variation of IBM's "Extended ASCII" set. This set is often referred to as "ANSI." However, the standard approved by the American National Standards Institute is for an eight-bit character set, not this set specifically. The set includes the following diacritics:
 * The ligature: '''Ææ'''
@@ Line 206: / Line 206: @@
 * The cedilla: '''Çç'''.
-The Latin-1 characters have codepoints from <tt>U+0000</tt> to <tt>U+00FF</tt>.
+In Unicode, the Latin-1 characters have codepoints from <tt>U+0000</tt> to <tt>U+00FF</tt>.
 ===Latin-2===
-Latin-2 diacritics are used with Eastern European languages:
+Latin-2 is an eight-bit character set intended for use with Eastern European languages. It includes the following diacritics:
 *The ogonek: '''Ąą Ęę Ţţ'''.
 *The acute: '''Áá, Ćć, Éé, Íí, Ĺĺ, Ńń, Óó, Ŕŕ, Śś, Úú, Ýý, Źź'''.
@@ Line 223: / Line 223: @@
 *The s sharp: '''ẞß'''.
-Turkish is encoded as Latin-5, while the Nordic languages are encoded as Latin-6.
+Turkish can be encoded as Latin-5, while the Nordic languages may be encoded as Latin-6. Since the shift to Unicode, the various eight-bit character sets have become less relevant.
 ==References==
@@ Line 246: / Line 246: @@
 *[http://espn.go.com/ ESPN.com]. ''ESPN Sports Almanac'' was a standard sports reference until it was discontinued in 2009. Much of the information that was formerly used for the almanac is available at this site.
-==Further Reading==
+==Further reading==
 *''Chicago Manual of Style''. This is the best-known style guide. It is produced by the University of Chicago Press.
 *''Merriam-Webster's Collegiate Dictionary''. This is the most widely used and authoritative of the Merriam-Webster dictionaries. The printed edition includes geography and biography sections not available in the free online version.
-*''Merriam-Webster's Geographical Dictionary'' (1997). Recommended by CMOS for the spelling of place names. It represents material culled from ''Britannica''. You can access the same information through ''Britannica''’s website, which is more up-to-date.
+*''Merriam-Webster's Geographical Dictionary'' (1997). Recommended by CMOS for the spelling of place names. It represents material culled from ''Britannica'', which is published by the same company. You can access the same information through ''Britannica''’s website, which is more up-to-date.
 *''Merriam-Webster's Biographical Dictionary'' (1995). Recommended by CMOS for the spelling of personal names. The comments above regarding ''Britannica'' and geographic names are even more applicable here since this book is no longer in print.
 *''National Geographic Atlas of the World''. Recommended by AP.
-[[Category:Grammar]][[Category:Linguistics]]
+[[Category:Grammar]]
+[[Category:Linguistics]]
+[[Category:Communication]]

Difference between revisions of "Diacritic"

Revision as of 12:31, March 25, 2017

Contents

Correct usage

Electronic encoding

Latin-1

Latin-2

References

External links

Further reading

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Popular Links

donate

Edit Console