The Is* Unicode script ranges in .NET's RegEx
by Michael S. Kaplan, published on 2005/09/13 10:31 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/09/13/464416.aspx
You'd think they would have a help topic that would list these, wouldn't you? I mean, since topics like Character Classes documents that they exist....
Ah well, I'll just do it now. Here are the ones as of Whidbey. Note that they are only up to Unicode 4.0, so that blocks added in Unicode 4.1 such as the new Georgian Nushkui range (U+2d00 to U+2d2f). There might also be a few ranges that are a kittle smaller than they ought to be due to additions....
The final thing to keep in mind -- it uses full ranges so if you have a Unicode code unit that is not yet assigned in the middle of one of these blocks, it will be considered part of one the block.
Without further adieu, here are the names that you can use in .NET regular expressions:
- IsAlphabeticPresentationForms
- IsArabic
- IsArabicPresentationForms-A
- IsArabicPresentationForms-B
- IsArmenian
- IsArrows
- IsBasicLatin
- IsBengali
- IsBlockElements
- IsBopomofo
- IsBopomofoExtended
- IsBoxDrawing
- IsBraillePatterns
- IsBuhid
- IsCJKCompatibility
- IsCJKCompatibilityForms
- IsCJKCompatibilityIdeographs
- IsCJKRadicalsSupplement
- IsCJKSymbolsandPunctuation
- IsCJKUnifiedIdeographs
- IsCJKUnifiedIdeographsExtensionA
- IsCherokee
- IsCombiningDiacriticalMarks
- IsCombiningDiacriticalMarksforSymbols
- IsCombiningHalfMarks
- IsCombiningMarksforSymbols
- IsControlPictures
- IsCurrencySymbols
- IsCyrillic
- IsCyrillicSupplement
- IsDevanagari
- IsDingbats
- IsEnclosedAlphanumerics
- IsEnclosedCJKLettersandMonths
- IsEthiopic
- IsGeneralPunctuation
- IsGeometricShapes
- IsGeorgian
- IsGreek
- IsGreekExtended
- IsGreekandCoptic
- IsGujarati
- IsGurmukhi
- IsHalfwidthandFullwidthForms
- IsHangulCompatibilityJamo
- IsHangulJamo
- IsHangulSyllables
- IsHanunoo
- IsHebrew
- IsHighPrivateUseSurrogates
- IsHighSurrogates
- IsHiragana
- IsIPAExtensions
- IsIdeographicDescriptionCharacters
- IsKanbun
- IsKangxiRadicals
- IsKannada
- IsKatakana
- IsKatakanaPhoneticExtensions
- IsKhmer
- IsKhmerSymbols
- IsLao
- IsLatin-1Supplement
- IsLatinExtended-A
- IsLatinExtended-B
- IsLatinExtendedAdditional
- IsLetterlikeSymbols
- IsLimbu
- IsLowSurrogates
- IsMalayalam
- IsMathematicalOperators
- IsMiscellaneousMathematicalSymbols-A
- IsMiscellaneousMathematicalSymbols-B
- IsMiscellaneousSymbols
- IsMiscellaneousSymbolsandArrows
- IsMiscellaneousTechnical
- IsMongolian
- IsMyanmar
- IsNumberForms
- IsOgham
- IsOpticalCharacterRecognition
- IsOriya
- IsPhoneticExtensions
- IsPrivateUse
- IsPrivateUseArea
- IsRunic
- IsSinhala
- IsSmallFormVariants
- IsSpacingModifierLetters
- IsSpecials
- IsSuperscriptsandSubscripts
- IsSupplementalArrows-A
- IsSupplementalArrows-B
- IsSupplementalMathematicalOperators
- IsSyriac
- IsTagalog
- IsTagbanwa
- IsTaiLe
- IsTamil
- IsTelugu
- IsThaana
- IsThai
- IsTibetan
- IsUnifiedCanadianAboriginalSyllabics
- IsVariationSelectors
- IsYiRadicals
- IsYiSyllables
- IsYijingHexagramSymbols
This post brought to you by every single character in Unicode 4.0
no comments
Please consider a
donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.
referenced by
go to newer or older post, or back to index or month or day