Passion * Technology * Ruthless Competence

Thursday, August 20, 2009

HawkCodeBox

Last month, I lamented the lack of extensibility of the WPF text box. While there are several vendors and at least one open source custom syntax highlighting text box, it still really bothers me how inextensible the basic WPF text box is. I just want to do a simple colorizing REPL – why is that so hard?

So instead of using any of those syntax highlighting text boxes, I decided to build my own using the approach Ken Johnson wrote about on Code Project. As I wrote before, it’s a hack – you set the text box’s foreground and background brushes to transparent so that you can override OnRender – but it works.

The big change I made from Ken’s code was to use DLR TokenCategorizer instead of regular expressions to tokenize the code. TokenCategorizer is a service provided by the DLR hosting API, which will tokenize a given script source for you. Here’s the code that colorizes the text in the text box.

var source = Engine.CreateScriptSourceFromString(this.Text);
var tokenizer = Engine.GetService<TokenCategorizer>();
tokenizer.Initialize(null, source, SourceLocation.MinValue);

var t = tokenizer.ReadToken();
while (t.Category != TokenCategory.EndOfStream)
{
    if (SyntaxMap.ContainsKey(t.Category))
    {
        ft.SetForegroundBrush(_syntaxMap[t.Category], 
             t.SourceSpan.Start.Index, t.SourceSpan.Length);
    }

    t = tokenizer.ReadToken();
}

As you can see, I ask the engine for a TokenCategorizer, initialize it with the text box’s current contents, then iterate thru the tokens, looking for ones in my SyntaxMap. If the token category is in the syntax map, we change the foreground brush for that span of formatted text (ft is a WPF FormattedText instance I created earlier in the method.

Of course, this approach isn’t very efficient – it re-colorizes the entire file on every change. It turns out that some DLR TokenCategorizer are restartable so you can cache the tokenizer state at any point and then return later with a new TokenCategorizer instance and pick up tokenizing where you left off. With this approach, you could say tokenize a line at a time, allowing you to only need to retokenize the line where the change occurred rather than the entire file. But only IronPython supports tokenizer restarting today, so I decided to take the easy way and simple re-colorize on every change.

I named the project HawkCodeBox and I’ve published the source up on GitHub. It’s fairly simple, but of course the goal wasn’t to build the be-all-end-all text editor – other people in the VS team already have that job.

Posted By Harry Pierson at 11:49 AM Pacific Daylight Time
Change Congress
Recent Bookmarks
Tags .NET Framework (2) __clrtype__ (9) ADO.NET (5) Agile (7) AJAX (3) Architecture (288) Guidance (6) Interop (2) Modelling (61) Patterns (7) Process (4) SOA (94) Web Services (5) ASP.NET (25) Async Messaging (2) Azure (1) Battlestar Galactica (3) BI (2) BizTalk (4) Blogging (117) dasBlog (11) Podcasting (4) BPM (1) C# (11) C++ (4) Capitals (5) CardSpace (3) CLR (2) CodePlex (1) College Football (10) Comedy Central (1) Community (81) Concurrency (6) Consumer Electronics (1) Database (13) Debugger (23) Dependency Injection (2) Development (122) C Plus Plus (1) Embedded (5) Lanugages (42) Media (2) P2P (11) Rotor (1) SharePoint (6) SOP (3) DIY (1) DLR (25) Domain Specific Languages (15) Durable Messaging (5) Dynamic Languages (12) Dynamic Silverlight (1) Education (3) Enterprise 2.0 (1) Entertainment (14) ETech (15) F# (51) Functional Programming (17) Game Development (2) Guidance Automation (3) Hardware (8) HawkCodeBox (1) HawkEye (3) Health (1) Hockey (31) Home Electronics (1) Home Network (5) Hosting API (1) Humor (5) IASA (1) Idempotence (3) infrastructure (5) Instrumentation (4) Integration (2) IronPython (112) IronRuby (16) Java (2) Job (3) Kodu (1) LangNET (2) Lightweight Debugger (5) LINQ (23) Live Framework (3) Live Mesh (2) Lost (1) Master Data Management (1) Media 2.0 (6) Microsoft (31) MIX06 (2) Mobile Phone (1) Monads (5) Morning Coffee (172) Object Oriented (4) Office (5) Open Source (8) Open Space (2) Operations (3) Other (135) Art (1) Books (1) Family (33) Games (18) General Geekery (27) Home Theater (1) Movies (23) Music (20) Politics (3) Society (1) Sports (37) Working at MSFT (19) Parallel Programming (3) Parsing Expression Grammar (16) patterns & practices (2) PDC08 (5) Politics (48) Polyglot (3) PowerPoint (2) PowerShell (39) Presentation (7) Projects (1) HawkWiki (1) Pygments (5) Python (6) Quote of the Day (4) Refactoring (1) Research (2) REST (18) Reuse (5) Robotics (2) Rock Band (4) Rome (5) Ruby (23) Ruby on Rails (1) Sci-Fi (2) Scripting (4) Security (3) Service Broker (14) SharePoint (2) Silverlight (20) Social Software (1) Software + Services (2) Software Design (2) Software Engineering (1) Software Factories (11) Software Industry (1) Space Elevator (1) Spark (1) SQL Server (2) Stephen Colbert (1) TechEd (7) TechEd06 (1) TechRec League (1) Television (6) Travel (7) Unified Client (1) Unit Testing (4) USC (1) UX (1) Virtual PC (2) Visual Basic (3) Visual Studio (20) Volta (2) Washington Capitals (37) WCF (31) Web 2.0 (67) Web Services (7) WF (21) Windows (3) Windows Live (29) Windows Live Writer (3) WPF (8) Xbox (1) Xbox 360 (54) XML (11) XNA (15) Zune (4)
Disclaimer: The information in this weblog is provided "AS IS" with no warranties, and confers no rights. This weblog does not represent the thoughts, intentions, plans or strategies of my employer. It is solely my opinion. Inappropriate comments will be deleted at the authors discretion.