Settings Results in 4 milliseconds

Performance Improvements in ASP.NET Core 8
Performance Improvements in ASP.NET Core 8

ASP.NET Core 8 and .NET 8 bring many exciting performance improvements. In this blog post, we will highlight some of the enhancements made in ASP.NET Core and show you how they can boost your web app’s speed and efficiency. This is a continuation of last year’s post on Performance improvements in ASP.NET Core 7. And, of course, it continues to be inspired by Performance Improvements in .NET 8. Many of those improvements either indirectly or directly improve the performance of ASP.NET Core as well. Benchmarking Setup We will use BenchmarkDotNet for many of the examples in this blog post. To setup a benchmarking project Create a new console app (dotnet new console) Add a Nuget reference to BenchmarkDotnet (dotnet add package BenchmarkDotnet) version 0.13.8+ Change Program.cs to var summary = BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(); Add the benchmarking code snippet below that you want to run Run dotnet run -c Release and enter the number of the benchmark you want to run when prompted Some of the benchmarks test internal types, and a self-contained benchmark cannot be written. In those cases we’ll either reference numbers that are gotten by running the benchmarks in the repository (and link to the code in the repository), or we’ll provide a simplified example to showcase what the improvement is doing. There are also some cases where we will reference our end-to-end benchmarks which are public at https//aka.ms/aspnet/benchmarks. Although we only display the last few months of data so that the page will load in a reasonable amount of time. Servers We have 3 server implementations in ASP.NET Core; Kestrel, Http.Sys, and IIS. The latter two are only usable on Windows and share a lot of code. Server performance is extremely important because it’s what processes incoming requests and forwards them to your application code. The faster we can process a request, the faster you can start running application code. Kestrel Header parsing is one of the first parts of processing done by a server for every request. Which means the performance is critical to allow requests to reach your application code as fast as possible. In Kestrel we read bytes off the connection into a System.IO.Pipelines.Pipe which is essentially a list of byte[]s. When parsing headers we are reading from that list of byte[]s and have two different code paths. One for when the full header is inside a single byte[] and another for when a header is split across multiple byte[]s. dotnet/aspnetcore#45044 updated the second (slower) code path to avoid allocating a byte[] when parsing the header, as well as optimizes our SequenceReader usage to mostly use the underlying ReadOnlySequence<byte> which can be faster in some cases. This resulted in a ~18% performance improvement for multi-span headers as well as making it allocation free which helps reduce GC pressure. The following microbenchmark is using internal types in Kestrel and isn’t easy to isolate as a minimal sample. For those interested it is located with the Kestrel source code and was run before and after the change. Method Mean Op/s Gen 0 Allocated MultispanUnicodeHeader – Before 573.8 ns 1,742,893.2 – 48 B MultispanUnicodeHeader – After 484.9 ns 2,062,450.8 – – Below is an allocation profile of an end-to-end benchmark we run on our CI showing the different with this change. We reduced the byte[] allocations of the scenario by 73%. From 7.8GB to 2GB (during the lifetime of the benchmark run). dotnet/aspnetcore#48368 replaced some internal custom vectorized code for ascii comparison checks with the new Ascii class in .NET 8. This allowed us to remove ~400 lines of code and take advantage of improvements like AVX512 and ARM AdvSIMD that are implemented in the Ascii code that we didn’t have in Kestrel’s implementation. Http.Sys Near the end of 7.0 we removed some extra thread pool dispatching in Kestrel that improved performance significantly. More details are in last years performance post. At the beginning of 8.0 we made similar changes to the Http.Sys server in dotnet/aspnetcore#44409. This improved our Json end to end benchmark by 11% from ~469k to ~522k RPS. Another change we made affects large responses especially in higher latency connections. dotnet/aspnetcore#47776 adds an on-by-default option to enable Kernel-mode response buffering. This allows application writes to be buffered in the OS layer regardless of whether the client connection has acked previous writes or not, and then the OS can optimize sending the data by parallelizing writes and/or sending larger chunks of data at a time. The benefits are clear when using connections with higher latency. To show a specific example we hosted a server in Sweden and a client in West Coast USA to create some latency in the connection. The following server code was used var builder = WebApplication.CreateBuilder(args); builder.WebHost.UseHttpSys(options => { options.UrlPrefixes.Add("http//+12345"); options.Authentication.Schemes = AuthenticationSchemes.None; options.Authentication.AllowAnonymous = true; options.EnableKernelResponseBuffering = true; // <-- new setting in 8.0 }); var app = builder.Build(); app.UseRouting(); app.MapGet("/file", () => { return TypedResults.File(File.Open("pathToLargeFile", FileMode.Open, FileAccess.Read)); }); app.Run(); The latency was around 200ms (round-trip) between client and server and the server was responding to client requests with a 212MB file. When setting HttpSysOptions.EnableKernelResponseBuffering to false the file download took ~11 minutes. And when setting it to true it took ~30 seconds to download the file. That’s a massive improvement, ~22x faster in this specific scenario! More details on how response buffering works can be found in this blog post. dotnet/aspnetcore#44561 refactors the internals of response writing in Http.Sys to remove a bunch of GCHandle allocations and conveniently removes a List<GCHandle> that was used to track handles for freeing. It does this by allocating and writing directly to NativeMemory when writing headers. By not pinning managed memory we are reducing GC pressure and helping reduce heap fragmentation. A downside is that we need to be extra careful to free the memory because the allocations are no longer tracked by the GC. Running a simple web app and tracking GCHandle usage shows that in 7.0 a small response with 4 headers was using 8 GCHandles per request, and when adding more headers it was using 2 more GCHandles per header. In 8.0 the same app was using only 4 GCHandles per request, regardless of the number of headers. dotnet/aspnetcore#45156 by @ladeak improved the implementation of HttpContext.Request.Headers.Keys and HttpContext.Request.Headers.Count in Http.Sys, which is also the same implementation used by IIS so double win. Before, those properties had generic implementations that used IEnumerable and linq expressions. Now they manually count and minimize allocations, making accessing Count completely allocation free. This benchmark uses internal types, so I’ll link to the microbenchmark source instead of providing a standalone microbenchmark. Before Method Mean Op/s Gen 0 Allocated CountSingleHeader 381.3 ns 2,622,896.1 0.0010 176 B CountLargeHeaders 3,293.4 ns 303,639.9 0.0534 9,032 B KeysSingleHeader 483.5 ns 2,068,299.5 0.0019 344 B KeysLargeHeaders 3,559.4 ns 280,947.4 0.0572 9,648 B After Method Mean Op/s Gen 0 Allocated CountSingleHeader 249.1 ns 4,014,316.0 – – CountLargeHeaders 278.3 ns 3,593,059.3 – – KeysSingleHeader 506.6 ns 1,974,125.9 – 32 B KeysLargeHeaders 1,314.6 ns 760,689.5 0.0172 2,776 B Native AOT Native AOT was first introduced in .NET 7 and only worked with console applications and a limited number of libraries. In .NET 8.0 we’ve improved the number of libraries that are supported in Native AOT as well as added support for ASP.NET Core applications. AOT apps can have minimized disk footprint, reduced startup times, and reduced memory demand. But before we talk about AOT more and show some numbers, we should talk about a prerequisite, trimming. Starting in .NET 6 trimming applications became a fully supported feature. Enabling this feature with <PublishTrimmed>true</PublishTrimmed> in your .csproj enables the trimmer to run during publish and remove code your application isn’t using. This can result in smaller deployed application sizes, useful in scenarios where you are running on memory constrained devices. Trimming isn’t free though, libraries might need to annotate types and method calls to tell the trimmer about code being used that the trimmer can’t determine, otherwise the trimmer might trim away code you’re relying on and your app won’t run as expected. The trimmer will raise warnings when it sees code that might not be compatible with trimming. Until .NET 8 the <TrimMode> property for publishing web apps was set to partial. This meant that only assemblies that explicitly stated they supported trimming would be trimmed. Now in 8.0, full is used for <TrimMode> which means all assemblies used by the app will be trimmed. These settings are documented in the trimming options docs. In .NET 6 and .NET 7 a lot of libraries weren’t compatible with trimming yet, notably ASP.NET Core libraries. If you tried to publish a simple ASP.NET Core app in 7.0 you would get a bunch of trimmer warnings because most of ASP.NET Core didn’t support trimming yet. The following is an ASP.NET Core app to show trimming in net7.0 vs. net8.0. All the numbers are for a windows publish. <Project Sdk="Microsoft.NET.Sdk.Web"> <PropertyGroup> <TargetFrameworks>net7.0;net8.0</TargetFrameworks> <Nullable>enable</Nullable> <ImplicitUsings>enable</ImplicitUsings> </PropertyGroup> </Project> // dotnet publish --self-contained --runtime win-x64 --framework net7.0 -pPublishTrimmed=true -pPublishSingleFile=true --configuration Release var app = WebApplication.Create(); app.Run((c) => c.Response.WriteAsync("hello world")); app.Run(); TFM Trimmed Warnings App Size Publish duration net7.0 false 0 88.4MB 3.9 sec net8.0 false 0 90.9MB 3.9 sec net7.0 true 16 28.9MB 16.4 sec net8.0 true 0 17.3MB 10.8 sec In addition to no more warnings when publishing trimmed in net8.0, the app size is smaller because we’ve annotated more libraries so the linker can find more code that isn’t being used by the app. Part of annotating the libraries involved analyzing what code is being kept by the trimmer and changing code to improve what can be trimmed. You can see numerous PRs to help this effort; dotnet/aspnetcore#47567, dotnet/aspnetcore#47454, dotnet/aspnetcore#46082, dotnet/aspnetcore#46015, dotnet/aspnetcore#45906, dotnet/aspnetcore#46020, and many more. The Publish duration field was calculated using the Measure-Command in powershell (and deleting /bin/ and /obj/ between every run). As you can see, enabling trimming can increase the publish time because the trimmer has to analyze the whole program to see what it can remove, which isn’t a free operation. We also introduced two smaller versions of WebApplication if you want even smaller apps via CreateSlimBuilder and CreateEmptyBuilder. Changing the previous app to use CreateSlimBuilder // dotnet publish --self-contained --runtime win-x64 --framework net8.0 -pPublishTrimmed=true -pPublishSingleFile=true --configuration Release var builder = WebApplication.CreateSlimBuilder(args); var app = builder.Create(); app.Run((c) => c.Response.WriteAsync("hello world")); app.Run(); will result in an app size of 15.5MB. And then going one step further with CreateEmptyBuilder // dotnet publish --self-contained --runtime win-x64 --framework net8.0 -pPublishTrimmed=true -pPublishSingleFile=true --configuration Release var builder = WebApplication.CreateEmptyBuilder(new WebApplicationOptions() { Args = args }); var app = builder.Create(); app.Run((c) => c.Response.WriteAsync("hello world")); app.Run(); will result in an app size of 13.7MB, although in this case the app won’t work because there is no server implementation registered. So if we add Kestrel via builder.WebHost.UseKestrelCore(); the app size becomes 15MB. TFM Builder App Size net8.0 Create 17.3MB net8.0 Slim 15.5MB net8.0 Empty 13.7MB net8.0 Empty+Server 15.0MB Note that both these APIs are available starting in 8.0 and remove a lot of defaults so it’s more pay for play. Now that we’ve taken a small look at trimming and seen that 8.0 has more trim compatible libraries, let’s take a look at Native AOT. Just like with trimming, if your app/library isn’t compatible with Native AOT you’ll get warnings when building for Native AOT and there are additional limitations to what works in Native AOT. Using the same app as before, we’ll enable Native AOT by adding <PublishAot>true</PublishAot> to our csproj. TFM AOT App Size Publish duration net7.0 false 88.4MB 3.9 sec net8.0 false 90.9MB 3.9 sec net7.0 true 40MB 71.7 sec net8.0 true 12.6MB 22.7 sec And just like with trimming, we can test the WebApplication APIs that have less defaults enabled. TFM Builder App Size net8.0 Create 12.6MB net8.0 Slim 8.8MB net8.0 Empty 5.7MB net8.0 Empty+Server 7.8MB That’s pretty cool! A small net8.0 app is 90.9MB and when published as Native AOT it’s 12.6MB, or as low as 7.8MB (assuming we want a server, which we probably do). Now let’s take a look at some other performance characteristics of a Native AOT app; startup speed, memory usage, and RPS. In order to properly show E2E benchmark numbers we need to use a multi-machine setup so that the server and client processes don’t steal CPU from each other and we don’t have random processes running like you would for a local machine. I’ll be using our internal benchmarking infrastructure that makes use of the benchmarking tool crank and our aspnet-citrine-win and aspnet-citrine-lin machines for server and load respectively. Both machine specs are described in our benchmarks readme. And finally, I’ll be using an application that uses Minimal APIs to return a json payload. This app uses the Slim builder we showed earlier as well as sets <InvariantGlobalization>true</InvariantGlobalization> in the csproj. If we run the app without any extra settings crank –config https//raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/goldilocks.benchmarks.yml –config https//raw.githubusercontent.com/aspnet/Benchmarks/main/build/ci.profile.yml –config https//raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/steadystate.profile.yml –scenario basicminimalapivanilla –profile intel-win-app –profile intel-lin-load –application.framework net8.0 –application.options.collectCounters true This gives us a ~293ms startup time, 444MB working set, and ~762k RPS. If we run the same app but publish it as Native AOT crank –config https//raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/goldilocks.benchmarks.yml –config https//raw.githubusercontent.com/aspnet/Benchmarks/main/build/ci.profile.yml –config https//raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/steadystate.profile.yml –scenario basicminimalapipublishaot –profile intel-win-app –profile intel-lin-load –application.framework net8.0 –application.options.collectCounters true We get ~67ms startup time, 56MB working set, and ~681k RPS. That’s ~77% faster startup speed, ~87% lower working set, and ~12% lower RPS. The startup speed is expected because the app has already been optimized, and there is no JIT running to start optimizing code. Also, in non-Native AOT apps, because startup methods are likely only called once, tiered compilation will never run on the startup methods so they won’t be as optimized as they could be, but in NativeAOT the startup method will be fully optimized. The working set is a bit surprising, it is lower because Native AOT apps by default run with the new Dynamic Adaptation To Application Sizes (DATAS) GC. This GC setting tries to maintain a balance between throughput and overall memory usage, which we can see it doing with an ~87% lower working set at the cost of some RPS. You can read more about the new GC setting in Maoni0’s blog. Let’s also compare the Native AOT vs. non-Native AOT apps with the Server GC. So we’ll add --application.environmentVariables DOTNET_GCDynamicAdaptationMode=0 when running the Native AOT app. This time we get ~64ms startup time, 403MB working set, and ~730k RPS. The startup time is still extremely fast because changing the GC doesn’t affect that, our working set is closer to the non-Native AOT app but smaller due in part to not having the JIT compiler loaded and running, and our RPS is closer to the non-Native AOT app because we’re using the Server GC which optimizes throughput more than memory usage. AOT GC Startup Working Set RPS false Server 293ms 444MB 762k false DATAS 303ms 77MB 739k true Server 64ms 403MB 730k true DATAS 67ms 56MB 681k Non-Native AOT apps have the JIT optimizing code while it’s running, and starting in .NET 8 the JIT by default will make use of dynamic PGO, this is a really cool feature that Native AOT isn’t able to benefit from and is one reason non-Native AOT apps can have more throughput than Native AOT apps. You can read more about dynamic PGO in the .NET 8 performance blog. If you’re willing to trade some publish size for potentially more optimized code you can pass /pOptimizationPreference=Speed when building and publishing your Native AOT app. When we do this for our benchmark app (with Server GC) we get a publish size of 9.5MB instead of 8.9MB and 745k RPS instead of 730k. The app we’ve been using makes use of Minimal APIs which by default isn’t trim friendly. It does a lot of reflection and dynamic code generation that isn’t statically analyzable so the trimmer isn’t able to safely trim the app. So why don’t we see warnings when we Native AOT publish this app? Because we wrote a source-generator called Request Delegate Generator (RDG) that replaces your MapGet, MapPost, etc. methods with trim friendly code. This source-generator is automatically used for ASP.NET Core apps when trimming/aot publishing. Which leads us into the next section where we dive into RDG. Request Delegate Generator The Request Delegate Generator (RDG) is a source-generator created to make Minimal APIs trimmer and Native AOT friendly. Without RDG, using Minimal APIs will result in many warnings and your app likely won’t work as expected. Here is a quick example to show an endpoint that will result in an exception when using Native AOT without RDG but will work with RDG enabled (or when not using Native AOT). app.MapGet("/test", (Bindable b) => "Hello world!"); public class Bindable { public static ValueTask<Bindable?> BindAsync(HttpContext context, ParameterInfo parameter) { return new ValueTask<Bindable?>(new Bindable()); } } This app throws when you send a GET request to /test because the Bindable.BindAsync method is referenced via reflection and so the trimmer can’t statically figure out that the method is being used and will remove it. Minimal APIs then sees the MapGet call as needing a request body which isn’t allowed by default for GET calls. Besides fixings warnings and making the app work as expected in Native AOT, we get improved first response time and reduced publish size. Without RDG, the first time a request is made to the app is when all the expression trees are generated for all endpoints in the application. Because RDG generates the source for an endpoint at compile time, there is no expression tree generation needed, the code for a specific endpoint is already available and can execute immediately. If we take the app used earlier for benchmarking AOT and look at time to first request we get ~187ms when not running as AOT and without RDG. We then get ~130ms when we enable RDG. When publishing as AOT, the time to first request is ~60ms regardless of using RDG. But this app only has 2 endpoints, so let’s add 1000 more endpoints and see the difference! 2 Routes AOT RDG First Request Publish Size false false 187ms 97MB false true 130ms 97MB true false 60ms 11.15MB true true 60ms 8.89MB 1002 Routes AOT RDG First Request Publish Size false false 1082ms 97MB false true 176ms 97MB true false 157ms 11.15MB true true 84ms 8.89MB Runtime APIs In this section we’ll be looking at changes that mainly involve updating to use new APIs introduced in .NET 8 in the Base Class Library (BCL). SearchValues dotnet/aspnetcore#45300 by @gfoidl, dotnet/aspnetcore#47459, dotnet/aspnetcore#49114, and dotnet/aspnetcore#49117 all make use of the new SearchValues type which lets these code paths take advantage of optimized search implementations for the specific values being searched for. The SearchValues section of the .NET 8 performance blog explains more details about the different search algorithms used and why this type is so cool! Spans dotnet/aspnetcore#46098 makes use of the new MemoryExtensions.Split(ReadOnlySpan<char> source, Span<Range> destination, char separator) method. This allows certain cases of string.Split(...) to be replaced with a non-allocating version. This saves the string[] allocation as well as the individual string allocations for the items in the string[]. More details on this new API can be seen in the .NET 8 Performance post span section. FrozenDictionary Another new type introduced is FrozenDictionary. This allows constructing a dictionary optimized for read operations at the cost of slower construction. dotnet/aspnetcore#49714 switches a Dictionary in routing to use FrozenDictionary. This dictionary is used when routing an http request to the appropriate endpoint which is almost every request to an application. The following tables show the cost of creating the dictionary vs. frozen dictionary, and then the cost of using a dictionary vs. frozen dictionary respectively. You can see that constructing a FrozenDictionary can be up to 13x slower, but the overall time is still in the micro second range (1/1000th of a millisecond) and the FrozenDictionary is only constructed once for the app. What we all like to see is that the per operation performance of using FrozenDictionary is 2.5x-3.5x faster than a Dictionary! [GroupBenchmarksBy(BenchmarkLogicalGroupRule.ByCategory)] public class JumpTableMultipleEntryBenchmark { private string[] _strings; private int[] _segments; private JumpTable _dictionary; private JumpTable _frozenDictionary; private List<(string text, int _)> _entries; [Params(1000)] public int NumRoutes; [GlobalSetup] public void Setup() { _strings = GetStrings(1000); _segments = new int[1000]; for (var i = 0; i < _strings.Length; i++) { _segments[i] = _strings[i].Length; } var samples = new int[NumRoutes]; for (var i = 0; i < samples.Length; i++) { samples[i] = i * (_strings.Length / NumRoutes); } _entries = new List<(string text, int _)>(); for (var i = 0; i < samples.Length; i++) { _entries.Add((_strings[samples[i]], i)); } _dictionary = new DictionaryJumpTable(0, -1, _entries.ToArray()); _frozenDictionary = new FrozenDictionaryJumpTable(0, -1, _entries.ToArray()); } [BenchmarkCategory("GetDestination"), Benchmark(Baseline = true, OperationsPerInvoke = 1000)] public int Dictionary() { var strings = _strings; var segments = _segments; var destination = 0; for (var i = 0; i < strings.Length; i++) { destination = _dictionary.GetDestination(strings[i], segments[i]); } return destination; } [BenchmarkCategory("GetDestination"), Benchmark(OperationsPerInvoke = 1000)] public int FrozenDictionary() { var strings = _strings; var segments = _segments; var destination = 0; for (var i = 0; i < strings.Length; i++) { destination = _frozenDictionary.GetDestination(strings[i], segments[i]); } return destination; } [BenchmarkCategory("Create"), Benchmark(Baseline = true)] public JumpTable CreateDictionaryJumpTable() => new DictionaryJumpTable(0, -1, _entries.ToArray()); [BenchmarkCategory("Create"), Benchmark] public JumpTable CreateFrozenDictionaryJumpTable() => new FrozenDictionaryJumpTable(0, -1, _entries.ToArray()); private static string[] GetStrings(int count) { var strings = new string[count]; for (var i = 0; i < count; i++) { var guid = Guid.NewGuid().ToString(); // Between 5 and 36 characters var text = guid.Substring(0, Math.Max(5, Math.Min(i, 36))); if (char.IsDigit(text[0])) { // Convert first character to a letter. text = ((char)(text[0] + ('G' - '0'))) + text.Substring(1); } if (i % 2 == 0) { // Lowercase half of them text = text.ToLowerInvariant(); } strings[i] = text; } return strings; } } public abstract class JumpTable { public abstract int GetDestination(string path, int segmentLength); } internal sealed class DictionaryJumpTable JumpTable { private readonly int _defaultDestination; private readonly int _exitDestination; private readonly Dictionary<string, int> _dictionary; public DictionaryJumpTable( int defaultDestination, int exitDestination, (string text, int destination)[] entries) { _defaultDestination = defaultDestination; _exitDestination = exitDestination; _dictionary = entries.ToDictionary(e => e.text, e => e.destination, StringComparer.OrdinalIgnoreCase); } public override int GetDestination(string path, int segmentLength) { if (segmentLength == 0) { return _exitDestination; } var text = path.Substring(0, segmentLength); if (_dictionary.TryGetValue(text, out var destination)) { return destination; } return _defaultDestination; } } internal sealed class FrozenDictionaryJumpTable JumpTable { private readonly int _defaultDestination; private readonly int _exitDestination; private readonly FrozenDictionary<string, int> _dictionary; public FrozenDictionaryJumpTable( int defaultDestination, int exitDestination, (string text, int destination)[] entries) { _defaultDestination = defaultDestination; _exitDestination = exitDestination; _dictionary = entries.ToFrozenDictionary(e => e.text, e => e.destination, StringComparer.OrdinalIgnoreCase); } public override int GetDestination(string path, int segmentLength) { if (segmentLength == 0) { return _exitDestination; } var text = path.Substring(0, segmentLength); if (_dictionary.TryGetValue(text, out var destination)) { return destination; } return _defaultDestination; } } Method NumRoutes Mean Error StdDev Ratio RatioSD CreateDictionaryJumpTable 25 735.797 ns 8.5503 ns 7.5797 ns 1.00 0.00 CreateFrozenDictionaryJumpTable 25 4,677.927 ns 80.4279 ns 71.2972 ns 6.36 0.11 CreateDictionaryJumpTable 50 1,433.309 ns 19.4435 ns 17.2362 ns 1.00 0.00 CreateFrozenDictionaryJumpTable 50 10,065.905 ns 188.7031 ns 176.5130 ns 7.03 0.12 CreateDictionaryJumpTable 100 2,712.224 ns 46.0878 ns 53.0747 ns 1.00 0.00 CreateFrozenDictionaryJumpTable 100 28,397.809 ns 358.2159 ns 335.0754 ns 10.46 0.20 CreateDictionaryJumpTable 1000 28,279.153 ns 424.3761 ns 354.3733 ns 1.00 0.00 CreateFrozenDictionaryJumpTable 1000 313,515.684 ns 6,148.5162 ns 8,208.0925 ns 11.26 0.33 Dictionary 25 21.428 ns 0.1816 ns 0.1516 ns 1.00 0.00 FrozenDictionary 25 7.137 ns 0.0588 ns 0.0521 ns 0.33 0.00 Dictionary 50 21.630 ns 0.1978 ns 0.1851 ns 1.00 0.00 FrozenDictionary 50 7.476 ns 0.0874 ns 0.0818 ns 0.35 0.00 Dictionary 100 23.508 ns 0.3498 ns 0.3272 ns 1.00 0.00 FrozenDictionary 100 7.123 ns 0.0840 ns 0.0745 ns 0.30 0.00 Dictionary 1000 23.761 ns 0.2360 ns 0.2207 ns 1.00 0.00 FrozenDictionary 1000 8.516 ns 0.1508 ns 0.1337 ns 0.36 0.01 Other This section is a compilation of changes that enhance performance but do not fall under any of the preceding categories. Regex As part of the AOT effort, we noticed the regex created in RegexRouteConstraint (see route constraints for more info) was adding ~1MB to the published app size. This is because the route constraints are dynamic (application code defines them) and we were using the Regex constructor that accepts RegexOptions. This meant the trimmer has to keep around all regex code that could potentially be used, including the NonBacktracking engine which keeps ~.8MB of code. By adding RegexOptions.Compiled the trimmer can now see that the NonBacktracking code will not be used and it can reduce the application size by ~.8MB. Additionally, using compiled regexes is faster than using the interpreted regex. The quick fix was to just add RegexOptions.Compiled when creating the Regex which was done in dotnet/aspnetcore#46192 by @eugeneogongo. The problem is that this slows down app startup because we resolve constraints when starting the app and compiled regexes are slower to construct. dotnet/aspnetcore#46323 fixes this by lazily initializing the regexes so app startup is actually faster than 7.0 when we weren’t using compiled regexes. It also added caching to the route constraints which means if you share constraints in multiple routes you will save allocations by sharing constraints across routes. Running a microbenchmark for the route builder to measure startup performance shows an almost 450% improvement when using 1000 routes due to no longer initializing the regexes. The benchmark lives in the dotnet/aspnetcore repo. It has a lot of setup code and would be a bit too long to put in this post. Before with interpreted regexes Method Mean Op/s Gen 0 Gen 1 Allocated Build 6.739 ms 148.4 15.6250 – 7 MB After with compiled and lazy regexes Method Mean Op/s Gen 0 Gen 1 Allocated Build 1.521 ms 657.2 5.8594 1.9531 2 MB Another Regex improvement came from dotnet/aspnetcore#44770 which switched a Regex usage in routing to use the Regex Source Generator. This moves the cost of compiling the Regex to build time, as well as resulting in faster Regex code due to optimizations the source generator takes advantage of that the in-process Regex compiler does not. We’ll show a simplified example that demonstrates using the generated regex vs. the compiled regex. public partial class AlphaRegex { static Regex Net7Constraint = new Regex( @"^[a-z]*$", RegexOptions.CultureInvariant | RegexOptions.Compiled | RegexOptions.IgnoreCase, TimeSpan.FromSeconds(10)); static Regex Net8Constraint = GetAlphaRouteRegex(); [GeneratedRegex(@"^[A-Za-z]*$")] private static partial Regex GetAlphaRouteRegex(); [Benchmark(Baseline = true)] public bool CompiledRegex() { return Net7Constraint.IsMatch("Administration") && Net7Constraint.IsMatch("US"); } [Benchmark] public bool SourceGenRegex() { return Net8Constraint.IsMatch("Administration") && Net8Constraint.IsMatch("US"); } } Method Mean Error StdDev Ratio CompiledRegex 86.92 ns 0.572 ns 0.447 ns 1.00 SourceGenRegex 57.81 ns 0.860 ns 0.805 ns 0.66 Analyzers Analyzers are useful for pointing out issues in code that can be hard to convey in API signatures, suggesting code patterns that are more readable, and they can also suggest more performant ways to write code. dotnet/aspnetcore#44799 and dotnet/aspnetcore#44791 both from @martincostello enabled CA1854 which helps avoid 2 dictionary lookups when only 1 is needed, and dotnet/aspnetcore#44269 enables a bunch of analyzers many of which help use more performant APIs and are described in more detail in last years .NET 7 Performance Post. I would encourage developers who are interested in performance in their own products to checkout performance focused analyzers which contains a list of many analyzers that will help avoid easy to fix performance issues. StringBuilder StringBuilder is an extremely useful class for constructing a string when you either can’t precompute the size of the string to create or want an easy way to construct a string without the complications involved with using string.Create(...). StringBuilder comes with a lot of helpful methods as well as a custom implementation of an InterpolatedStringHandler. What this means is that you can “create” strings to add to the StringBuilder without actually allocating the string. For example, previously you might have written stringBuilder.Append(FormattableString.Invariant($"{key} = {value}"));. This would have allocated a string via FormattableString.Invariant(...) then put it in the StringBuilders internal char[] buffer, making the string a temporary allocation. Instead you can write stringBuilder.Append(CultureInfo.InvariantCulture, $"{key} = {value}");. This also looks like it would allocate a string via $"{key} = {value}", but because StringBuilder has a custom InterpolatedStringHandler the string isn’t actually allocated and instead is written directly to the internal char[]. dotnet/aspnetcore#44691 fixes some usage patterns with StringBuilder to avoid allocations as well as makes use of the InterpolatedStringHandler overload(s). One specific example was taking a byte[] and converting it into a string in hexidecimal format so we could send it as a query string. [MemoryDiagnoser] public class AppendBenchmark { private byte[] _b = new byte[30]; [GlobalSetup] public void Setup() { RandomNumberGenerator.Fill(_b); } [Benchmark] public string AppendToString() { var sb = new StringBuilder(); foreach (var b in _b) { sb.Append(b.ToString("x2", CultureInfo.InvariantCulture)); } return sb.ToString(); } [Benchmark] public string AppendInterpolated() { var sb = new StringBuilder(); foreach (var b in _b) { sb.Append(CultureInfo.InvariantCulture, $"{bx2}"); } return sb.ToString(); } } Method Mean Gen0 Allocated AppendToString 748.7 ns 0.1841 1448 B AppendInterpolated 739.7 ns 0.0620 488 B Summary Thanks for reading! Try out .NET 8 and let us know how your app’s performance has changed! We are always looking for feedback on how to improve the product and look forward to your contributions, be it an issue report or a PR. If you want more performance goodness, you can read the Performance Improvements in .NET 8 post. Also, take a look at Developer Stories which showcases multiple teams at Microsoft migrating from .NET Framework to .NET or to newer versions of .NET and seeing major performance and operating cost wins. The post Performance Improvements in ASP.NET Core 8 appeared first on .NET Blog.


IIS Tweak Performance Settings
Category: Servers

Go to Application Pool and set Maximum Worker Process = 0When deploying your applic ...


Views: 436 Likes: 95
Developing Optimized GitHub Actions with .NET and Native AOT
Developing Optimized GitHub Actions with .NET and ...

Developing GitHub Actions with .NET and Native AOT has become increasingly popular in recent years due to its numerous benefits. Here are some compelling reasons why you should consider using .NET for your next GitHub Action1. Cross-platform compatibility .NET is a cross-platform framework that can run on Windows, Linux, and macOS. This means that your GitHub Actions can be executed seamlessly across different operating systems, making it easier to deploy and manage your codebase.2. Performance optimization .NET uses just-in-time (JIT) compilation, which allows for faster execution of your code. Additionally, the use of native AOT (Ahead-of-Time) compilation can further optimize your code's performance, making it run even faster.3. Scalability .NET is designed to be highly scalable and can handle large amounts of data and traffic. This makes it an ideal choice for GitHub Actions that need to process large volumes of data or execute complex workflows.4. Integration with other Microsoft technologies .NET integrates seamlessly with other Microsoft technologies such as Azure DevOps, Visual Studio, and PowerShell. This means that you can leverage these tools to enhance your GitHub Actions development experience.5. Community support The .NET community is large and active, with many resources available for developers. This makes it easier to find help and support when needed, ensuring that your GitHub Actions are developed efficiently and effectively.Overall, developing GitHub Actions with .NET and Native AOT offers numerous benefits, including cross-platform compatibility, performance optimization, scalability, integration with other Microsoft technologies, and community support. By leveraging these benefits, you can create efficient and effective GitHub Actions that meet your specific needs.


Best way to create a Form in MVC (aspnet 6)
Category: .Net 7

Question How do you create a form that is pesists data on ModelState validation ...


Views: 27 Likes: 31
 The .NET Stacks #62: ?? And we&#x27;re back
The .NET Stacks #62 ?? And we&#x27;re back

This is the web version of my weekly newsletter, The .NET Stacks, originally sent to email subscribers on September 13, 2021. Subscribe at the bottom of the post to get this right away!Happy Monday! Miss me? A few of you said you have, but I'm 60% sure that's sarcasm.As you know, I took the last month or so off from the newsletter to focus on other things. I know I wasn't exactly specific on why, and appreciate some of you reaching out. I wasn't comfortable sharing it at the time, but I needed to take time away to focus on determining the next step in my career. If you've interviewed lately, I'm sure you understand ... it really is a full-time job.  I'm happy to say I've accepted a remote tech lead role for a SaaS company here. I'm rested and ready, so let's get into it! I'm trying something a little different this week—feel free to let me know what you think.?? My favorite from last weekASP.NET 6.0 Minimal APIs, why should you care?Ben FosterWe've talked about Minimal APIs a lot in this newsletter and it's quite the hot topic in the .NET community. An alternative way to write APIs in .NET 6 and beyond, there's a lot of folks wondering if it's suitable for production, or can lead to misuse. Ben notes "Minimal simply means that it contains the minimum set of components needed to build HTTP APIs ... It doesn’t mean that the application you build will be simple or not require good design.""I find that one of the biggest advantages to Minimal APIs is that they make it easier to build APIs in an opinionated way. After many years building HTTP services, I have a preferred approach. With MVC I would replace the built-in validation with Fluent Validation and my controllers were little more than a dispatch call to Mediatr. With Minimal APIs I get even more control. Of course if MVC offers everything you need, then use that."In a similar vein, Nick Chapsas has a great walkthrough on strategies for building production-ready Minimal APIs. No one expects your API to be in one file, and he shows practical ways to deal with dependencies while leveraging minimal API patterns. Damian Edwards has a nice Twitter thread, as well. As great as these community discussions are, I really think the greatest benefit is getting lost the performance gains.?? Community and eventsIncreasing developer happiness with GitHub code scanningSam PartingtonIf you work in GitHub, you probably already know that GitHub utilizes code scanning to find security vulnerabilities and errors in your repository. Sam Partington writes about something you might not know they use CodeQL—their internal code analysis engine—to protect themselves from common coding mistakes. Here's what Sam says about loopy performance issues "In addition to protecting against missing error checking, we also want to keep our database-querying code performant. N+1 queries are a common performance issue. This is where some expensive operation is performed once for every member of a set, so the code will get slower as the number of items increases. Database calls in a loop are often the culprit here; typically, you’ll get better performance from a batch query outside of the loop instead.""We created a custom CodeQL query ... We filter that list of calls down to those that happen within a loop and fail CI if any are encountered. What’s nice about CodeQL is that we’re not limited to database calls directly within the body of a loop?calls within functions called directly or indirectly from the loop are caught too."You can check out the post for more details and learn how to use these queries or make your own.More from last weekSimon Bisson writes about how to use the VS Code editor in your own projects.The Netflix Tech Blog starts a series on practical API design and also starts writing about their decision-making process.The .NET Docs Show talks about micr0 frontends with Blazor.For community standups, Entity Framework talks about OSS projects, ASP.NET has an anniversary, .NET MAUI discusses accessibility, and Machine Learning holds office hours.?? Web developmentHow To Map A Route in an ASP.NET Core MVC applicationKhalid AbuhakmehIf you're new to ASP.NET Core web development, Khalid put together a nice post on how to add an existing endpoint to an existing ASP.NET Core MVC app. Even if you aren't a beginner, you might learn how to resolve sticky routing issues. At the bottom of the post, he has a nice checklist you should consider when adding a new endpoint.More from last weekBen Foster explores custom model binding with Minimal APIs in .NET 6.Thomas Ardal debugs System.FormatException when launching ASP.NET Core.Jeremy Morgan builds a small web API with Azure Functions and SQLite.Ed Charbeneau works with low-code data grids and Blazor.Scott Hanselman works with a Minimal API todo app.?? The .NET platformUsing Source Generators with Blazor components in .NET 6Andrew LockWhen Andrew was upgrading a Blazor app to .NET 6, he found that source generators that worked in .NET 5 failed to discover Blazor components in his .NET 6 app because of changes to the Razor compilation process.He writes "The problem is that my source generators were relying on the output of the Razor compiler in .NET 5 ... My source generator was looking for components in the compilation that are decorated with [RouteAttribute]. With .NET 6, the Razor tooling is a source generator, so there is no 'first step'; the Razor tooling executes at the same time as my source generator. That is great for performance, but it means the files my source generator was relying on (the generated component classes) don't exist when my generator runs."While this is by design, Andrew has a great post underlying the issue and potential workarounds.More from last weekMark Downie writes about his favorite improvements in .NET 6.Sergey Vasiliev writes about optimizing .NET apps.Pawel Szydziak writes cleaner, safer code with SonarQube, Docker, and .NET Core.Sam Basu writes about how to develop for desktop in 2022, and also about developing for .NET MAUI on macOS.Paul Michaels manually parses a JSON string using System.Text.Json.Johnson Towoju writes logs to SQL Server using NLog.Andrew Lock uses source generators with Blazor components in .NET 6.Rick Strahl launches Visual Studio Code cleanly from a .NET app.Jirí Cincura calls a C# static constructor multiple times.? The cloudMinimal Api in .NET 6 Out Of Process Azure FunctionsAdam StorrWith all this talk about Minimal APIs, Adam asks can I use it with the new out-of-process Azure Functions model in .NET 6?He says "Azure Functions with HttpTriggers are similar to ASP.NET Core controller actions in that they handle http requests, have routing, can handle model binding, dependency injection etc. so how could a 'Minimal API' using Azure Functions look?"More from last weekDamien Bowden uses Azure security groups in ASP.NET Core with an Azure B2C identity provider.Jon Gallant works with the ChainedTokenCredential in the Azure Identity library.Adam Storr uses .NET 6 Minimal APIs with out-of-process Azure Functions.?? ToolsNew Improved Attach to Process Dialog ExperienceHarshada HoleWith the 2022 update, Visual Studio is improving the debugging experience—included is a new Attach to Process dialog experience.Harshada says "We have added command-line details, app pool details, parent/child process tree view, and the select running window from the desktop option in the attach to process dialog. These make it convenient to find the right process you need to attach. Also, the Attach to Process dialog is now asynchronous, making it interactive even when the process list is updating." The post walks through these updates in detail.More from last weekJeremy Likness looks at the EF Core Azure Cosmos DB provider.Harshada Hole writes about the new Attach to Process dialog experience in Visual Studio.Ben De St Paer-Gotch goes behind the scenes on Docker Desktop.Esteban Solano Granados plays with .NET 6, C# 10, and Docker.?? Design, testing, and best practicesShip / Show / Ask A modern branching strategyRouan WilsenachRouan says "Ship/Show/Ask is a branching strategy that combines the features of Pull Requests with the ability to keep shipping changes. Changes are categorized as either Ship (merge into mainline without review), Show (open a pull request for review, but merge into mainline immediately), or Ask (open a pull request for discussion before merging)."More from last weekLiana Martirosyan writes about enabling team learning and boost performance.Sagar Nangare writes about measuring user experience in modern applications and infrastructure.Neal Ford and Mark Richards talk about the hard parts of software architecture.Derek Comartin discusses event-sourced aggregate design.Steve Smith refactors to value objects.Sam Milbrath writes about holding teams accountable without micromanaging.Helen Scott asks how can you stay ahead of the curve as a developer?Rouan Wilsenach writes about a ship / show / ask branching strategy.Jeremy Miller writes about integration Testing using the IHost Lifecycle with xUnit.Net.?? Podcasts and VideosServerless Chats discusses serverless for beginners.The .NET Core Show talks about DotPurple With Michael Babienco.The Changelog talks to a lawyer about GitHub Copilot.Technology and Friends talks to Sam Basu about .NET MAUI.Visual Studio Toolbox talks about Web Live Preview.The ASP.NET Monsters talk about new Git commands.Adventures in .NET talk about Jupyter notebooks.The On .NET Show migrates apps to modern authentication and processes payments with C# and Stripe.


 The .NET Stacks #33: ?? A blazing conversation with Steve Sanderson
The .NET Stacks #33 ?? A blazing conversation wi ...

Happy Monday, all. What did you get NuGet for its 10th birthday?This weekMicrosoft blogs about more .NET 5 improvementsA study on migrating a hectic service to .NET CoreMeet Jab, a new compile-time DI libraryDev Discussions Steve SandersonLast week in the .NET worldMicrosoft blogs about more .NET 5 improvementsThis week, Microsoft pushed a few more blog posts to promote .NET 5 improvements Sourabh Shirhatti wrote about diagnostic improvements, and Mána Píchová writes about .NET networking improvements.Diagnostic improvementsWith .NET 5, the diagnostic suite of tools does not require installing them as .NET global tools—they can now be installed without the .NET SDK. There’s now a single-file distribution mechanism that only requires a runtime of .NET Core 3.1 or higher. You can check out the GitHub repo to geek out on all the available diagnostics tools. In other news, you can now perform startup tracing from EventPipe as the tooling can now suspend the runtime during startup until a tool is connected. Check out the blog post for the full treatment.Networking improvementsIn terms of .NET 5 networking improvements, the team added the ability to use cancellation timeouts from HttpClient without the need for a custom CancellationToken. While the client still throws a TaskCanceledException, the inner exception is a TimeoutException when timeouts occur. .NET 5 also supports multiple connections with HTTP/2, a configurable ping mechanism, experimental support for HTTP/3, and various telemetry improvements. Check out the networking blog post for details. It’s a nice complement to Stephen Toub’s opus about .NET 5 performance improvements.A study on migrating a hectic service to .NET CoreThis week, Avanindra Paruchuri wrote about migrating the Azure Active Directory gateway—and its 115 billion daily requests—over to .NET Core. While there’s nothing preventing you hosting .NET Framework apps in the cloud, the bloat of the framework often leads to expensive cloud spend.The gateway’s scale of execution results in significant consumption of compute resources, which in turn costs money. Finding ways to reduce the cost of executing the service has been a key goal for the team behind it. The buzz around .NET Core’s focus on performance caught our attention, especially since TechEmpower listed ASP.NET Core as one of the fastest web frameworks on the planet.In Azure AD gateway’s case, we were able to cut our CPU costs by 50%. As a result of the gains in throughput, we were able to reduce our fleet size from ~40k cores to ~20k cores (50% reduction) … Our CPU usage was reduced by half on .NET Core 3.1 compared to .NET Framework 4.6.2 (effectively doubling our throughput).It’s a nice piece on how they were able to gradually move over and gotchas they learned along the way.Meet Jab, a new compile-time DI libraryThis week, Pavel Krymets introduced Jab, a library used for compile-time dependency injection. Pavel works with the Azure SDKs and used to work on the ASP.NET Core team. Remember a few weeks ago, when we said that innovation in C# source generators will be coming in 2021? Here we go.From the GitHub readme, it promises fast startup (200x more than Microsoft.Extensions.DependencyInjection), fast resolution (a 7x improvement), no runtime dependencies, with all code generating during project compilation. Will it run on ASP.NET Core? Not likely, since ASP.NET Core is heavily dependent on the runtime thanks to type accessibility and dependency discovery, but Pavel wonders if there’s a middle ground.Dev Discussions Steve SandersonIt seems like forever ago when, at NDC Oslo in 2017, Steve Sanderson showed off a new web UI framework with the caveat “an experiment, something for you to be amused by.” By extending Dot Net Anywhere (DNA), Chris Bacon’s portable .NET runtime, on WebAssembly, he was able to load and run C# in the browser. In the browser!Of course, this amusing experiment has grown into Blazor, a robust system for writing web UIs in C#. I was happy to talk to Steve Sanderson about his passions for the front-end web, how far Blazor has come, and what’s coming to Blazor in .NET 6.Years ago, you probably envisioned what Blazor could be. Has it met its potential, or are there other areas to focus on?We’re not there yet. If you go on YouTube and find the first demo I ever did of Blazor at NDC Oslo in 2017, you’ll see my original prototype had near-instant live reloading while coding, and the download size was really tiny. I still aspire to get the real version of Blazor to have those characteristics. Of course, the prototype had the advantage of only needing to do a tiny number of things—creating a production-capable version is 100x more work, which is why it hasn’t yet got there, but has of course exceeded the prototype vastly in more important ways.Good news though is that in .NET 6 we expect to ship an even better version of live-updating-while-coding than I had in that first prototype, so it’s getting there!When looking at AOT, you’ll see increased performance but a larger download size. Do you see any other tradeoffs developers will need to consider?The mixed-mode flavour of AOT, in which some of your code is interpreted and some is AOT, allows for a customizable tradeoff between size and speed, but also includes some subtleties like extra overhead when calling from AOT to interpreted code and vice-versa.Also, when you enable AOT, your app’s publish time may go up substantially (maybe by 5-10 minutes, depending on code size) because the whole Emscripten toolchain just takes that long. This wouldn’t affect your daily development flow on your own machine, but likely means your CI builds could take longer.It’s still quite impressive to see the entire .NET runtime run in the browser for Blazor Web Assembly. That comes with an upfront cost, as we know. I know that the Blazor team has done a ton of work to help lighten the footprint and speed up performance. With the exception of AOT, do you envision more work on this? Do you see a point where it’ll be as lightweight as other leading front-end frameworks, or will folks need to understand it’s a cost that comes with a full framework in the browser?The size of the .NET runtime isn’t ever going to reduce to near-zero, so JS-based microframeworks (whose size could be just a few KB) are always going to be smaller. We’re not trying to win outright based on size alone—that would be madness. Blazor WebAssembly is aimed to be maximally productive for developers while being small enough to download that, in very realistic business app scenarios, the download size shouldn’t be any reason for concern.That said, it’s conceivable that new web platform features like Signed HTTP Exchanges could let us smartly pre-load the .NET WebAssembly runtime in a browser in the background (directly from some Microsoft CDN) while you’re visiting a Blazor WebAssembly site, so that it’s instantly available at zero download size when you go to other Blazor WebAssembly sites. Signed HTTP Exchanges allow for a modern equivalent to the older idea of a cross-site CDN cache. We don’t have a definite plan about that yet as not all browsers have added support for it.Check out the entire interview at my site.?? Last week in the .NET world?? The Top 3Andrew Lock introduces the ASP.NET Core Data Protection system.Maarten Balliauw writes about building a friendly .NET SDK.Josef Ottosson writes an Azure Function to zip multiple files from Azure Storage.?? AnnouncementsShelley Bransten announces Microsoft Cloud for Retail.Christopher Gill celebrates NuGet’s 10th birthday.Tara Overfield releases the January 2021 Security and Quality Rollup Updates for .NET Framework, and Rahul Bhandari writes about the .NET January 2021 updates..NET 6 nightly builds for Apple M1 are now available.The Visual Studio team wants your feedback on Razor syntax coloring.?? Community and eventsThe .NET Docs Show talks to Luis Quintanilla about F#.Pavel Krymets introduces Jab, a compile-time DI container.The Entity Framework Standup talks about EF Core 6 survey results, and the Languages & Runtime standup discusses plans for .NET 6 and VB source generators.Sarah Novotny writes about 4 open source lessons for 2021.IdentityServer v5 has shipped.Khalid Abuhakmeh rethinks OSS attribution in .NET.TechBash 2021 is slated for October 19-22, 2021.?? Web developmentDave Brock builds a “search-as-you-type” box in Blazor.Cody Merritt Anhorn uses localization with Blazor.Changhui Xu uploads files with Angular and .NET Web API.Mark Pahulje uses HtmlAgilityPack to get all emails from an HTML page.Jon Hilton uses local storage with Blazor.Anthony Giretti tests gRPC endpoints with gRPCurl, and also explores gRPCui.The folks at Uno write about building a single-page app in XAML and C# with WebAssembly.Marinko Spasojevic handles query strings in Blazor WebAssembly.Daniel Krzyczkowski continues building out his ASP.NET Core Web API by integrating with Azure Cosmos DB.?? The .NET platformSean Killeen describes the many flavors of .NET.Mattias Karlsson writes about his boilerplate starting point for .NET console apps.David Ramel delivers a one-stop shop for .NET 5 improvements.Sam Walpole discusses writing decoupled code with MediatR.Sourabh Shirhatti writes about diagnostics improvements with .NET 5.Mána Píchová writes about .NET 5 networking improvements.? The cloudAvanindra Paruchuri writes about migrating the Azure AD gateway to .NET Core.Johnny Reilly works with Azure Easy Auth.Muhammed Saleem works with Azure Functions.Chris Noring uses Azure Key Vault to manage secrets.Bryan Soltis posts a file to an Azure Function in 3 minutes.Damian Brady generates a GitHub Actions workflow with Visual Studio or the dotnet CLI.Thomas Ardal builds and tests multiple .NET versions with GitHub Actions.Dominique St-Amand works with integration tests using Azure Storage emulator and .NET Core in Azure DevOps.Aaron Powell uses environments for approval workflows with GitHub Actions.Damien Bowden protects legacy APIs with an ASP.NET Core YARP reverse proxy and Azure AD Auth.?? LanguagesKhalid Abuhakmeh writes about Base64 encoding with C#.Franco Tiveron writes about a developer’s C# 9 cheat sheet.Bruno Sonnino uses C# to convert XML data to JSON.Jacob E. Shore writes about his first impressions of F#.Matthew Crews writes about learning resources for F#.Mark-James McDougall writes an iRacing SDK implementation in F#.?? ToolsElton Stoneman writes about understanding Microsoft’s Docker images for .NET apps.Jon P. Smith writes about updating many-to-many relationships in EF Core 5 and above.Ruben Rios writes about a more integrated terminal experience with Visual Studio.Benjamin Day writes about tests in Visual Studio for Mac.The folks at Packt write about DAPR.Peter De Tender publishes Azure Container Instances from the Docker CLI.Nikola Zivkovic writes about linear regression with ML.NET.Patrick Smacchia writes how NDepend used Resharper to quickly refactored more than 23,000 calls to Debug.Assert().Mark Heath discusses his plans for NAudio 2.Michal Bialecki asks is Entity Framework Core fast?Jon P. Smith introduces a library to automate soft deletes in EF Core.?? XamarinLeomaris Reyes introduces UX design with Xamarin Forms.Charlin Agramonte writes about XAML naming conventions in Xamarin.Forms.Leomaris Reyes works with the Infogram in Xamarin.Forms 5.0.Rafael Veronezi previews XAML UIs.James Montemagno writes about how to integrate support emails in mobile apps with data and logs.Leomaris Reyes writes about the Xamarin.Forms File Picker.?? Design, testing, and best practicesSteve Gordon writes about how to become a better developer by asking questions.Derek Comartin says start with a monolith, not microservices.Stephen Cleary writes about durable queues.?? PodcastsScott Hanselman explores event modeling with Adam Dymitruk.At Working Code podcast, a discussion on monoliths vs. microservices.The .NET Rocks podcast checks in on IdentityServer.The .NET Core Show talks Blazor with Chris Sainty.The 6-Figure Developer podcast talks to Christos Matskas about Microsoft Identity.?? VideosThe ON.NET Show inspects application metrics with dotnet-monitor, works on change notifications with Microsoft Graph, and inspects application metrics with dotnet-monitor.Scott Hanselman shows you what happens when after you enter a URL in your browser.The ASP.NET Monsters talk about migrating their site to Azure Blob Storage..At Technology and Friends, David Giard talks to Mike Benkovich about GitHub Actions and Visual Studio.


 The .NET Stacks #8: functional C# 9, .NET Foundation nominees, Azure community, more!
The .NET Stacks #8 functional C# 9, .NET Foundat ...

This is an archive of my weekly (free!) newsletter, -The .NET Stacks-. Consider subscribing today to get this content right away! Subscribers don’t have to wait a week to receive the content.On tap this weekC# 9 a functionally better releaseThe .NET Foundation nominees are out!Dev Discussions Michael CrumpCommunity roundupC# 9 a functionally better releaseI’ve been writing a lot about C# 9 lately. No, seriously a lot. This week I went a little nuts with three posts I talked about records, pattern matching, and top-level programs. I’ve been learning a ton, which is always the main goal, but what’s really interesting is how C# is starting to blur the lines between object-oriented and functional programming. Throughout the years, we’ve seen some FP concepts visit C#, but I feel this release is really kicking it up a notch.In the not-so-distant past, discussing FP and OO meant putting up with silly dogmatic arguments that they have to be mutually exclusive. It isn’t hard to understand why traditional concepts of OO constructs are grouping data and behavior (state) in single mutable objects, and FP draws a hard line between data and behavior in the name of purity and minimizing side effects (immutability by default).So, typically as a .NET developer, this left you with two choices C#, .NET’s flagship language, or F#, a wonderful functional language that is concise (no curlies or semi-colons and great type inference), convenient (functions as first-class objects), and has default immutability.However, this is no longer a binary choice. For example, let’s look at a blog post from a few years ago that maps C# concepts to F# concepts.C#/OO has variables, F#/FP has immutable values. C# 9 init-only properties and records bring that ability to C#.C# has statements, F# has expressions. C# 8 introduced switch expressions and enhanced pattern matching, and has more expressions littered throughout the language now.C# has objects with methods, F# has types and functions. C# 9 records are also blurring the line in this regard.So here we are, just years after wondering if F# will ever take over C#, we see people wondering the exact opposite as Isaac Abraham asks will C# replace F#? (Spoiler alert no.)There is definitely pushback in the community from C# 8 purists, to which I say why not both? You now have the freedom to “bring in” the value of functional programming, while doing it in a familiar language. You can bring in these features, along with C#’s compatibility. These changes will not break the language. And if they don’t appeal to you, you don’t have to use them. (Of course, mixing FP and OO in C# is not always graceful and is definitely worth mentioning.)This isn’t a C# vs F# rant, but it comes down to this is C# with functional bits “good enough” because of your team’s skillset, comfort level, and OO needs? Or do you need a clean break, and immutability by default? As for me, I enjoy seeing these features gradually introduced. For example, C# 9 records allow you to build immutable structures but the language isn’t imposing this on you for all your objects. You need to opt in.A more nuanced question to ask is will C#’s functional concepts ever overpower the language and tilt the scales in FP’s direction? Soon, I’ll be interviewing Phillip Carter (the PM for F# at Microsoft) and am curious to hear what he has to say about it. Any questions? Let me know soon and I’ll be sure to include them.The .NET Foundation nominees are outThis week, the .NET Foundation announced the Board of Director nominees for the 2020 campaign. I am familiar with most of these folks (a few are subscribers, hi!)—it’s a very strong list and you probably can’t go wrong with anyone. I’d encourage you to look at the list and all their profiles to see who you’d like to vote for (if you are a member). If not, you can apply for membership. Or, if you’re just following the progress of the foundation, that’s great too.I know I’ve talked a lot about the Foundation lately, but this is an important moment for the .NET Foundation. The luster has worn off and it’s time to address the big questions what exactly is the Foundation responsible for? Where is the line between “independence” and Microsoft interests? When OSS projects collide with Microsoft interests, what is the process to work through it? And will the Foundation commit itself to open communication and greater transparency?As for me, these are the big questions I hope the nominees are thinking about, among other things.Dev Discussions Michael CrumpIf you’ve worked on Azure, you’ve likely come across Michael Crump’s work. He started Azure Tips and Tricks, a collection of tips, videos, and talks—if it’s Azure, it’s probably there. He also runs a popular Twitch stream where he talks about various topics.I caught up with Michael to talk about how he got to working on Azure at Microsoft, his work for the developer community, and his programming advice.My crack team of researchers tell me that you were a former Microsoft Silverlight MVP. Ah, memories. Do you miss it?Ah, yes. I was a Microsoft MVP for 4 years, I believe. I spent a lot of time working with Silverlight because, at that time, I was working in the medical field and a lot of our doctors used Macs. Since I was a C# WinForms/WPF developer, I jumped at the chance to start using those skillsets for code that would run on PCs and Macs.you walk me through your path to Microsoft, and what you do at Microsoft now?I started in Mac tech support because after graduating college, Mac tech support agents were getting paid more than PC agents (supply and demand, I guess!). Then, I was a full-time software developer for about 8 years. I worked in the medical field and created a calculator that determined what amount of vitamins our pre-mature babies should take.Well, after a while, the stress got to me and I discovered my love for teaching and started a job at Telerik as a developer advocate. Then, the opportunity came at Microsoft for a role to educate and inspire application developers. So my role today consists of developer content in many forms, and helping to set our Tier 1 event strategy for app developers.Tell us a little about Azure Tips and Tricks. What motivated you to get started, and how can people get involved?Azure Tips and Tricks was created because I’d find a thing or two about Azure, and forget how to do it again. It was originally designed as something just for me but many blog aggregators starting picking up on the posts and we decided to go big with it—from e-books, blog posts, videos, conference talks and stickers.The easiest way to contribute is by clicking on the Edit Page button at the bottom of each page. You can also go to http//source.azuredev.tips to learn more.What made you get into Twitch? What goes on in your channel?I loved the ability to actually code and have someone watch you and help you code. The interactivity aspect and seeing the same folks come back gets you hooked.The stream is broken down into three streams a weekAzure Tips and Tricks, every Wednesday at 1 PM PST (Pacific Standard Time, America)Live Interviews with Developers, every Friday at 9 AM PST (Pacific Standard Time, America)Live coding/Security Sunday streams, Sundays at 1030 AM PST (Pacific Standard Time, America)What is your one piece of programming advice?I actually published a list of my top 12 things every developer should know.My top one would probably be to learn a different programming language (other than your primary language). Simply put, it broadens your perspective and permits a deeper understanding of how a computer and programming languages work.This is only an excerpt of my talk with Michael. Read the full interview over at my website.Community roundupAn extremely busy week, full of great content!MicrosoftAnnouncementsAKS now supports confidential workloads.The Edge team announces the storage access API.Microsoft introduces the Text Analytics for Health APIs.Pratik Nadagouda talks about updates to the Git experience in Visual Studio.Eric Boyd shows off new Azure Cognitive Services capabilities.The .NET Foundation has the nominees set for the 2020 campaign.VideosThe Visual Studio Toolbox begins a series on performance profiling and continues their series on finding code in Visual Studio.The Xamarin Show discusses App Center and App Insights.Data Exposed continues their “why Azure SQL is best for devs” series.So many community standups we have the Languages & Runtime one, Entity Framework, and ASP.NET Core.Blog postsJason Freeberg continues his Zero to Hero with App Service series.Miguel Ramos dives into WinUI 3 in desktop apps.Jayme Singleton runs through the .NET virtual events in July.Leonard Lobel highlights the Azure CosmosDB change feed.Community BlogsASP.NET CoreChristian Nagel walks through local users with ASP.NET Core.Andrew Lock continues talking about adding an endpoint graph to ASP.NET Core..Thomas Ardal adds Razor runtime compilation for ASP.NET Core.Anthony Giretti exposes proto files in a lightweight gRPC service.Neel Bhatt introduces event sourcing in .NET Core.Dominick Baier talks about flexible access token validation in ASP.NET Core.The .NET Rocks podcast talks about ASP.NET Core Endpoints with Steve Smith.The ON.NET show discusses SignalR.BlazorWael Kdouh secures a Blazor WebAssembly Application With Azure Active Directory.Jon Hilton discusses Blazor validation logic on the client and the server.Marinko Spasojevic consumes a web API with Blazor WebAssembly.Matthew Jones continues writing Minesweeper for Blazor Web Assembly.Entity FrameworkKhalid Abuhakmeh talks about adding custom database functions for EF Core.Jon P. Smith discusses soft deleting data with Global Query Filters in EF Core.LanguagesDave Brock goes deep on C# 9, with records, pattern matching, and top-level programs.Ian Griffiths continues his series on C# 8 nullable references with conditional post-conditions..Khalid Abuhakmeh walks through reading from a file in C#.AzureJoseph Guadagno uses Azure Key Vault to secure Azure Functions (hi, Joe!).Visual Studio Magazine walks through Azure Machine Learning Studio Web.Damien Bowden walks through using external inputs in Azure Durable Functions.Azure Tips and Tricks has new content about Azure certifications for developers.Jason Gaylord discusses adding Azure feature flags to client applications.XamarinSimon Bisson pontificates on .NET MAUI and the future of Xamarin.Leomaris Reyes uses biometric identification in Xamarin.Forms.Kym Phillpotts creates a pizza shop in Xamarin.Forms.The Xamarin Podcast discusses Xamarin.Forms 4.7 and other topics.ToolsJetBrains introduces the .NET Guide.Jason Gaylord shows us how to create and delete branches in Visual Studio Code.Mike Larah uses custom browser configurations with Visual Studio.Muhammad Rehan Saeed shows us how to publish NuGet packages quickly.Patrick Smacchia talks about his top 10 Visual Studio refactoring tips.Bruce Cottman asks if GitHub Actions will kill off Jenkins.ProjectsOren Eini releases CosmosDB Profiler 1.0.Manuel Grundner introduces his new Tasty project, an effort to bring the style of his favorite test frameworks to .NET.Community podcasts and videosScott Hanselman shows off his Raspberry Pi and shows off how you can run .NET Notebooks and .NET Interactive on it, talks about Git pull requests, shows off Git 101 basics, and walks through setting up a prompt with Git, Windows Terminal, PowerShell, and Cascadia Code.The ASP.NET Monsters talk about NodaTime and API controllers.The 6-Figure Developer podcast talks about AI with Matthew Renze.The No Dogma podcast talks with Bill Wagner about .NET 5 and unifying .NET.The Coding Blocks podcast studies The DevOps Handbook.New subscribers and feedbackHas this email been forwarded to you? Welcome! I’d love for you to subscribe and join the community. I promise to guard your email address with my life.I would love to hear any feedback you have for The .NET Stacks! My goal is to make this the one-stop shop for weekly updates on developing in the .NET ecosystem, so I look forward to any feedback you can provide. You can directly reply to this email, or talk to me on Twitter as well. See you next week!


ASP.NET 8 Best Practices: Coding, Performance Tips ...
Category: .Net 7

In this chapter, we will explore various best practices and performance tips to enhance your ASP. ...


Views: 368 Likes: 98
AWS Inferentia2 builds on AWS Inferentia1 by delivering 4x higher throughput and 10x lower latency
AWS Inferentia2 builds on AWS Inferentia1 by deliv ...

The size of the machine learning (ML) models––large language models (LLMs) and foundation models (FMs)––is growing fast year-over-year, and these models need faster and more powerful accelerators, especially for generative AI. AWS Inferentia2 was designed from the ground up to deliver higher performance while lowering the cost of LLMs and generative AI inference. In this post, we show how the second generation of AWS Inferentia builds on the capabilities introduced with AWS Inferentia1 and meets the unique demands of deploying and running LLMs and FMs. The first generation of AWS Inferentia, a purpose-built accelerator launched in 2019, is optimized to accelerate deep learning inference. AWS Inferentia helped ML users reduce their inference costs and improve their prediction throughput and latency. With AWS Inferentia1, customers saw up to 2.3x higher throughput and up to 70% lower cost per inference than comparable inference-optimized Amazon Elastic Compute Cloud (Amazon EC2) instances. AWS Inferentia2, featured in the new Amazon EC2 Inf2 instances and supported in Amazon SageMaker, is optimized for large-scale generative AI inference and is the first inference focused instance from AWS that is optimized for distributed inference, with high-speed, low-latency connectivity between accelerators. You can now efficiently deploy a 175-billion-parameter model for inference across multiple accelerators on a single Inf2 instance without requiring expensive training instances. Until now, customers who had large models could only use instances that were built for training, but this is a waste of resources––given that they’re more expensive, consume more energy, and their workload doesn’t make use of all the available resources (such as faster networking and storage). With AWS Inferentia2, you can achieve 4 times higher throughput and up to 10 times lower latency compared to AWS Inferentia1. Also, the second generation of AWS Inferentia adds enhanced support for more data types, custom operators, dynamic tensors, and more. AWS Inferentia2 has 4 times more memory capacity, 16.4 times higher memory bandwidth than AWS Inferentia1, and native support for sharding large models across multiple accelerators. The accelerators use NeuronLink and Neuron Collective Communication to maximize the speed of data transfer between them or between an accelerator and the network adapter. AWS Inferentia2 is better suited for larger models, which require sharding across multiple accelerators, although AWS Inferentia1 is still a great option for smaller models because it provides better price-performance compared to alternatives. Architecture evolution To compare both generations of AWS Inferentia, let’s review the architecture of AWS Inferentia1. It has four NeuronCores v1 per chip, shown in the following diagram. Specifications per chip Compute – Four cores delivering in total 128 INT8 TOPS and 64FP16/BF16 TFLOPS Memory – 8 GB of DRAM (50 GB/sec of bandwidth), shared by all four cores NeuronLink – Link between cores for sharding models across two or more cores Let’s look at how AWS Inferentia2 is organized. Each AWS Inferentia2 chip has two upgraded cores based on the NeuronCore-v2 architecture. Like AWS Inferentia1, you can run different models on each NeuronCore or combine multiple cores to shard big models. Specifications per chip Compute – Two cores delivering in total 380 INT8 TOPS, 190 FP16/BF16/cFP8/TF32 TFLOPS, and 47.5 FP32 TFLOPS Memory – 32 GB of HBM, shared by both cores NeuronLink – Link between chips (384 GB/sec per device) for sharding models across two or more cores NeuronCore-v2 has a modular design with four independent engines ScalarEngine (3 times faster than v1) – Operates on floating point numbers––1600 (BF16/FP16) FLOPS VectorEngine (10 times faster than v1) – Operates on vectors of numbers with single operation for computations such as normalization, pooling, and others. TensorEngine (6 times faster than v1) – Performs tensor computations such as Conv, Reshape, Transpose, and others. GPSIMD-Engine – Has eight fully programmable 512-bit wide general-purpose processors for you to create your custom operators with standard PyTorch custom C++ operators API. This is a new feature, introduced in NeuronCore-v2. AWS Inferentia2 NeuronCore-v2 is faster and more optimized. Also, it’s capable of accelerating different types and sizes of models, ranging from simple models such as ResNet 50 to large language models or foundation models with billions of parameters such as GPT-3 (175 billion parameters). AWS Inferentia2 also has a larger and faster internal memory, when compared to AWS Inferentia1, as shown in the following table. Chip Neuron Cores Memory Type Memory Size Memory Bandwidth AWS Inferentia x4 (v1) DDR4 8GB 50GB/S AWS Inferentia 2 x2 (v2) HBM 32GB 820GB/S The memory you find in AWS Inferentia2 is the type High-Bandwidth Memory (HBM) type. Each AWS Inferentia2 chip has 32 GB and that can be combined with other chips to distribute very large models using NeuronLink (device-to-device interconnect). An inf2.48xlarge, for instance, has 12 AWS Inferentia2 accelerators with a total of 384 GB of accelerated memory. The speed of AWS Inferentia2 memory is 16.4 times faster than AWS Inferentia1, as shown in the previous table. Other features AWS Inferentia2 offers the following additional features Hardware supported – cFP8 (new, configurable FP8), FP16, BF16, TF32, FP32, INT8, INT16 and INT32. For more information, refer to Data Types. Lazy Tensor inference – We discuss Lazy Tensor inference later in this post. Custom operators – Developers can use standard PyTorch custom operators programming interfaces to use the Custom C++ Operators feature. A custom operator is composed of low-level primitives available in the Tensor Factory Functions and accelerated by GPSIMD-Engine. Control-flow (coming soon) – This is for native programming language control flow inside the model to eventually preprocess and postprocess data from one layer to another. Dynamic-shapes (coming soon) – This is useful when your model changes the shape of the output of any internal layer dynamically. For instance a filter which reduces the output tensor size or shape inside the model, based on the input data. Accelerating models on AWS Inferentia1 and AWS Inferentia2 The AWS Neuron SDK is used for compiling and running your model. It is natively integrated with PyTorch and TensorFlow. That way, you don’t need to run an additional tool. Use your original code, written in one of these ML frameworks, and with a few lines of code changes, you’re good to go with AWS Inferentia. Let’s look at how to compile and run a model on AWS Inferentia1 and AWS Inferentia2 using PyTorch. Load a pre-trained model (ResNet 50) from torchvision Load a pre-trained model and run it one time to warm it up import torch import torchvision model = torchvision.models.resnet50(weights='IMAGENET1K_V1').eval().cpu() x = torch.rand(1,3,224,224).float().cpu() # dummy input y = model(x) # warmup model Trace and deploy the accelerated model on Inferentia1 To trace the model to AWS Inferentia, import torch_neuron and invoke the tracing function. Keep in mind that the model needs to be PyTorch Jit traceable to work. At the end of the tracing process, save the model as a normal PyTorch model. Compile the model one time and load it back as many times as you need. The Neuron SDK runtime is already integrated to PyTorch and is responsible for sending the operators to the AWS Inferentia1 chip automatically to accelerate your model. In your inference code, you always need to import torch_neuron to activate the integrated runtime. You can pass additional parameters to the compiler to customize the way it optimizes the model or to enable special features such as neuron-pipeline-cores. Shard your model across multiple cores to increase throughput. import torch_neuron # Tracing the model using AWS NeuronSDK neuron_model = torch_neuron.trace(model,x) # trace model to Inferentia # Saving for future use neuron_model.save('neuron_resnet50.pt') # Next time you don't need to trace the model again # Just load it and AWS NeuronSDK will send it to Inferentia automatically neuron_model = torch.jit.load('neuron_resnet50.pt') # accelerated inference on Inferentia y = neuron_model(x) Tracing and deploying the accelerated model on Inferentia2 For AWS Inferentia2, the process is similar. The only difference is the package you import ends with x torch_neuronx. The Neuron SDK takes care of the compilation and running of the model for you transparently. You can also pass additional parameters to the compiler to fine-tune the operation or activate specific functionalities. import torch_neuronx # Tracing the model using NeuronSDK neuron_model = torch_neuronx.trace(model,x) # trace model to Inferentia # Saving for future use neuron_model.save('neuron_resnet50.pt') # Next time you don't need to trace the model again # Just load it and NeuronSDK will send it to Inferentia automatically neuron_model = torch.jit.load('neuron_resnet50.pt') # accelerated inference on Inferentia y = neuron_model(x) AWS Inferentia2 also offers a second approach for running a model called Lazy Tensor inference. In this mode, you don’t trace or compile the model previously; instead, the compiler runs on the fly every time you run your code. It isn’t recommended for production, given that traced mode has many advantages over Lazy Tensor inference. However, if you’re still developing your model and need to test it faster, Lazy Tensor inference can be a good alternative. Here’s how to compile and run a model using Lazy Tensor import torch import torchvision import torch_neuronx import torch_xla.core.xla_model as xm device = xm.xla_device() # Create XLA device model = torchvision.models.resnet50(weights='IMAGENET1K_V1').eval().cpu() model.to(device) x = torch.rand((1,3,224,224), device=device) # dummy input with torch.no_grad() y = model(x) xm.mark_step() # Compilation occurs here Now that you’re familiar with AWS Inferentia2, a good next step is to get started with PyTorch or Tensorflow and learn how to set up a dev environment and run tutorials and examples. Also, check the AWS Neuron Samples GitHub repo, where you can find multiple examples of how to prepare models to run on Inf2, Inf1, and Trn1. Summary of feature comparison between AWS Inferentia1 and AWS Inferentia2 The AWS Inferentia2 compiler is XLA-based, and AWS is part of OpenXLA initiative. This is the biggest difference over AWS Inferentia1, and that’s relevant because PyTorch, TensorFlow, and JAX have native XLA integrations. XLA brings many performance improvements, given that it optimizes the graph to compute the results in a single kernel launch. It fuses together successive tensor operations and outputs optimal machine code for accelerating model runs on AWS Inferentia2. Other parts of the Neuron SDK were also improved in AWS Inferentia2, while keeping the user experience as simple as possible while tracing and running models. The following table shows the features available in both versions of the compiler and runtime. Feature torch-neuron torch-neuronx Tensorboard Yes Yes Supported Instances Inf1 Inf2 & Trn1 Inference Support Yes Yes Training Support No Yes Architecture NeuronCore-v1 NeuronCore-v2 Trace API torch_neuron.trace() torch_neuronx.trace() Distributed inference NeuronCore Pipeline Collective Communications IR GraphDef HLO Compiler neuron-cc neuronx-cc Monitoring neuron-monitor / monitor-top neuron-monitor / monitor-top For a more detailed comparison between torch-neuron (Inf1) and torch-neuronx (Inf2), refer to Comparison of torch-neuron (Inf1) versus torch-neuronx (Inf2 & Trn1) for Inference. Model Serving After tracing a model to deploy to Inf2, you have many deployment options. You can run real-time predictions or batch predictions in different ways. Inf2 is available because EC2 instances are natively integrated to other AWS services that make use of Deep Learning Containers (DLCs) such as Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (Amazon EKS), and SageMaker. AWS Inferentia2 is compatible with the most popular deployment technologies. Here are a list of some of the options you have for deploying models using AWS Inferentia2 SageMaker – Fully managed service to prepare data and build, train, and deploy ML models TorchServe – PyTorch integrated deployment mechanism TensorFlow Serving – TensorFlow integrated deployment mechanism Deep Java Library – Open-source Java mechanism for model deployment and training Triton – NVIDIA open-source service for model deployment Benchmark The following table highlights the improvements AWS Inferentia2 brings over AWS Inferentia1. Specifically, we measure latency (how fast the model can make a prediction using each accelerator), throughput (how many inferences per second), and cost per inference (how much each inference costs in US dollars). The lower the latency in milliseconds and costs in US dollars, the better. The higher the throughput the better. Two models were used in this process––both large language models ELECTRA large discriminator and BERT large uncased. PyTorch (1.13.1) and Hugging Face transformers (v4.7.0), the main libraries used in this experiment, ran on Python 3.8. After compiling the models for batch size = 1 and 10 (using the code from the previous section as a reference), each model was warmed up (invoked one time to initialize the context) and then invoked 10 times in a row. The following table shows average numbers collected in this simple benchmark. Electra large discriminator (334,092,288 parameters ~593 MB) Bert large uncased (335,143,938 parameters ~580 MB) OPT-66B (66 billion parameterss ~124 GB) Model Name Batch Size Sentence Length Latency (ms) Improvements Inf2 over Inf1 (x Times) Throughput (Inferences per Second) Cost per Inference (EC2 us-east-1) ** Inf1 Inf2 Inf1 Inf2 Inf1 Inf2 ElectraLargeDiscriminator 1 256 35.7 8.31 4.30 28.01 120.34 $0.0000023 $0.0000018 ElectraLargeDiscriminator 10 256 343.7 72.9 4.71 2.91 13.72 $0.0000022 $0.0000015 BertLargeUncased 1 128 28.2 3.1 9.10 35.46 322.58 $0.0000018 $0.0000007 BertLargeUncased 10 128 121.1 23.6 5.13 8.26 42.37 $0.0000008 $0.0000005 * c6a.8xlarge with 32 AMD Epyc 7313 CPU was used in this benchmark. **EC2 Public pricing in us-east-1 on April 20 inf2.xlarge $0.7582/hr; inf1.xlarge $0.228/hr. Cost per inference considers the cost per element in a batch. (Cost per inference equals the total cost of model invocation/batch size.) For additional information about training and inference performance, refer to Trn1/Trn1n Performance. Conclusion AWS Inferentia2 is a powerful technology designed for improving performance and reducing costs of deep learning model inference. More performant than AWS Inferentia1, it offers up to 4 times higher throughput, up to 10 times lower latency, and up to 50% better performance/watt than other comparable inference-optimized EC2 instances. In the end, you pay less, have a faster application, and meet your sustainability goals. It’s simple and straightforward to migrate your inference code to AWS Inferentia2, which also supports a broader variety of models, including large language models and foundation models for generative AI. You can get started by following the AWS Neuron SDK documentation to set up a development environment and start your accelerated deep learning project. To help you get started, Hugging Face has added Neuron support to their Optimum library, which optimizes models for faster training and inference, and they have many examples tasks ready to run on Inf2. Also, check our Deploy large language models on AWS Inferentia2 using large model inference containers to learn about deploying LLMs to AWS Inferentia2 using model inference containers. For additional examples, see the AWS Neuron Samples GitHub repo. About the authors Samir Araújo is an AI/ML Solutions Architect at AWS. He helps customers creating AI/ML solutions which solve their business challenges using AWS. He has been working on several AI/ML projects related to computer vision, natural language processing, forecasting, ML at the edge, and more. He likes playing with hardware and automation projects in his free time, and he has a particular interest for robotics.


How to choose a Framework when making Web app
Category: Computer Programming

Choosing the right framework for develop ...


Views: 0 Likes: 29
RuntimeBinderException: The best overloaded method ...
Category: .Net 7

Question How do you solve the error <span style="backgroun ...


Views: 452 Likes: 73
InvalidOperationException: An exception was thrown ...
Category: Entity Framework

What is a LINQ Query?Linq which stands for Language Integrated Query ...


Views: 0 Likes: 69
An error occurred during the compilation of a reso ...
Category: .Net 7

Question Why is this error happening? "An error occurred during the compilation of a resource re ...


Views: 0 Likes: 33
Towards debuggability and secure deployments of eBPF programs on Windows
Towards debuggability and secure deployments of eB ...

The eBPF for Windows runtime has introduced a new mode of operation, native code generation, which exists alongside the currently supported modes of operation for eBPF programs JIT (just-in-time compilation) and an interpreter, with the administrator able to select the mode when a program is loaded. The native code generation mode involves loading Windows drivers that contain signed eBPF programs. Due to the risks associated with having an interpreter in the kernel address space, it was decided to only enable it for non-production signed builds. The JIT mode supports the ability to dynamically generate code, write them into kernel pages, and finally set the permissions on the page from read/write to read/execute.Enter Windows Hyper-V hypervisor, a type 1 hypervisor, which has the Hypervisor-protected Code Integrity (HVCI) feature. It splits the kernel memory space into virtual trust levels (VTLs), with isolation enforced at the hardware level using virtualization extensions of the CPU. Most parts of the Windows' kernel and all drivers operate in VTL0, the lowest trusted level, with privileged operations being performed inside the Windows secure kernel operating in VTL1. During the boot process, the hypervisor verifies the integrity of the secure kernel using cryptographic signatures prior to launching it, after which the secure kernel verifies the cryptographic signature of each code page prior to enabling read/execute permissions on the page. The signatures are validated using keys obtained from X.509 certificates that chain up to a Microsoft trusted root certificate. The net effect of this policy is that if HVCI is enabled, it is no longer possible to inject dynamically generated code pages into the kernel, which prevents the use of JIT mode. Similarly, Windows uses cryptographic signatures to restrict what code can be executed in the kernel. In keeping with these principles, eBPF for Windows has introduced a new mode of execution that an administrator can choose to use that maintains the integrity of the kernel and provides the safety promises of eBPF native code generation. The process starts with the existing tool chains, whereby eBPF programs are compiled into eBPF bytecode and emitted as ELF object files. The examples below assume the eBPF-for-Windows NuGet package has been unpacked to c\ebpf and that the command is being executed from within a Developer Command Prompt for VS 2019. How to use native code generationHello_world.c// Copyright (c) Microsoft Corporation// SPDX-License-Identifier MIT#include "bpf_helpers.h"SEC("bind")intHelloWorld(){bpf_printk("Hello World!");return 0;}Compile to eBPF>clang -target bpf -O2 -Werror -Ic/ebpf/include -c hello_world.c -o out/hello_world.o>llvm-objdump -S out/hello_world.oeBPF bytecodeb7 01 00 00 72 6c 64 21 r1 = 56022949063 1a f8 ff 00 00 00 00 *(u32 *)(r10 - 8) = r118 01 00 00 48 65 6c 6c 00 00 00 00 6f 20 57 6f r1 = 8022916924116329800 ll7b 1a f0 ff 00 00 00 00 *(u64 *)(r10 - 16) = r1b7 01 00 00 00 00 00 00 r1 = 073 1a fc ff 00 00 00 00 *(u8 *)(r10 - 4) = r1bf a1 00 00 00 00 00 00 r1 = r1007 01 00 00 f0 ff ff ff r1 += -16b7 02 00 00 0d 00 00 00 r2 = 1385 00 00 00 0c 00 00 00 call 12b7 00 00 00 00 00 00 00 r0 = 095 00 00 00 00 00 00 00 exitThe next step involves a new tool introduced specifically to support this scenario bpf2c. This tool parses the supplied ELF file, extracting the list of maps and stored programs before handing off the byte code to the eBPF verifier, which proves that eBPF byte code is effectively sandboxed and constrained to terminate within a set number of instructions. The tool then performs a per-instruction translation of the eBPF byte code into the equivalent C statements and emits skeleton code used to perform relocation operations at run time. For convenience, the NuGet package also contains a PowerShell script that invokes bpf2c and then uses MSBuild to produce the final Portable Executable (PE) image, (an image format used by Windows). As an aside, the process of generating the native image is decoupled from the process of developing the eBPF program, making it a deployment time decision rather than a development time one.> powershell c\ebpf\bin\Convert-BpfToNative.ps1 hello_world.oC\Users\user\hello_world\out>powershell c\ebpf\bin\Convert-BpfToNative.ps1 hello_world.oMicrosoft (R) Build Engine version 16.9.0+57a23d249 for .NET FrameworkCopyright (C) Microsoft Corporation. All rights reserved.Build started 5/17/2022 93843 AM.Project "C\Users\user\hello_world\out\hello_world.vcxproj" on node 1 (default targets).DriverBuildNotifications Building 'hello_world_km' with toolset 'WindowsKernelModeDriver10.0' and the 'Desktop' target platform. Using KMDF 1.15.<Lines removed for clarity>Done Building Project "C\Users\user\hello_world\out\hello_world.vcxproj" (default targets).Build succeeded. 0 Warning(s) 0 Error(s)Time Elapsed 000003.57> type hello_world_driver.c// Snip Removed boiler plate driver code and map setup.static uint64_tHelloWorld(void* context){ // Prologue uint64_t stack[(UBPF_STACK_SIZE + 7) / 8]; register uint64_t r0 = 0; register uint64_t r1 = 0; register uint64_t r2 = 0; register uint64_t r3 = 0; register uint64_t r4 = 0; register uint64_t r5 = 0; register uint64_t r10 = 0; r1 = (uintptr_t)context; r10 = (uintptr_t)((uint8_t*)stack + sizeof(stack)); // EBPF_OP_MOV64_IMM pc=0 dst=r1 src=r0 offset=0 imm=560229490 r1 = IMMEDIATE(560229490); // EBPF_OP_STXW pc=1 dst=r10 src=r1 offset=-8 imm=0 *(uint32_t*)(uintptr_t)(r10 + OFFSET(-8)) = (uint32_t)r1; // EBPF_OP_LDDW pc=2 dst=r1 src=r0 offset=0 imm=1819043144 r1 = (uint64_t)8022916924116329800; // EBPF_OP_STXDW pc=4 dst=r10 src=r1 offset=-16 imm=0 *(uint64_t*)(uintptr_t)(r10 + OFFSET(-16)) = (uint64_t)r1; // EBPF_OP_MOV64_IMM pc=5 dst=r1 src=r0 offset=0 imm=0 r1 = IMMEDIATE(0); // EBPF_OP_STXB pc=6 dst=r10 src=r1 offset=-4 imm=0 *(uint8_t*)(uintptr_t)(r10 + OFFSET(-4)) = (uint8_t)r1; // EBPF_OP_MOV64_REG pc=7 dst=r1 src=r10 offset=0 imm=0 r1 = r10; // EBPF_OP_ADD64_IMM pc=8 dst=r1 src=r0 offset=0 imm=-16 r1 += IMMEDIATE(-16); // EBPF_OP_MOV64_IMM pc=9 dst=r2 src=r0 offset=0 imm=13 r2 = IMMEDIATE(13); // EBPF_OP_CALL pc=10 dst=r0 src=r0 offset=0 imm=12 r0 = HelloWorld_helpers[0].address(r1, r2, r3, r4, r5); if ((HelloWorld_helpers[0].tail_call) && (r0 == 0)) return 0; // EBPF_OP_MOV64_IMM pc=11 dst=r0 src=r0 offset=0 imm=0 r0 = IMMEDIATE(0); // EBPF_OP_EXIT pc=12 dst=r0 src=r0 offset=0 imm=0 return r0;}As illustrated here each eBPF instruction is translated into an equivalent C statement, with eBPF registers being emulated using stack variables named R0 to R10.Lastly, the tool adds a set of boilerplate code that handles the interactions with the I/O Manager required to load the code into the Windows kernel, with the result being a single C file. The Convert-BpfToNative.ps1 script then invokes the normal Windows Driver Kit (WDK) tools to compile and link the eBPF program into its final PE image. Once the developer is ready to deploy their eBPF program in a production environment that has HVCI enabled, they will need to get their driver signed via the normal driver signing process. For a production workflow, one could imagine a service that consumes the ELF file (the eBPF byte code), securely verifies that it is safe, generates the native image, and signs it before publishing it for deployment. This could then be integrated into the existing developer workflows.The eBPF for Windows runtime has been enlightened to support these eBPF programs hosted in Windows drivers, resulting in a developer experience that closely mimics the behavior of eBPF programs that use JIT. The result is a pipeline that looks like thisThe net effect is to introduce a new statically sandboxed model for Windows Drivers, with the resulting driver being signed using standard Windows driver signing mechanisms. While this additional step does increase the time needed to deploy an eBPF program, some customers have determined that the tradeoff is justified by the ability to safely add eBPF programs to systems with HVCI enabled.Diagnostics and eBPF programsOne of the key pain points of developing eBPF programs is making sure they pass verification. The process of loading programs once they have been compiled, potentially on an entirely different system, gives rise to a subpar developer experience. As part of adding support for native code generation, eBPF for Windows has integrated the verification into the build pipeline, so that developers get build-time feedback when an eBPF program fails verification.Using a slightly more complex eBPF program as an example, the developer gets a build-time error when the program fails verificationeBPF C codeThis then points the developer to line 96 of the source code, where they can see that the start time variable could be NULL.As with all other instances of code, eBPF programs can have bugs. While the verifier can prove that code is safe, it is unable to prove code is correct. One approach that was pioneered by the Linux community is the use of logging built around the bpf_printk style macro, which permits developers to insert trace statements into their eBPF programs to aid diagnosability. To both maintain compatibility with the Linux eBPF ecosystem as well as being a useful mechanism, eBPF for Windows has adopted a similar approach. One of the key differences is how these events are implemented, with Linux using a file-based approach and Windows using Event Tracing for Windows (ETW). ETW has a long history within Windows and a rich ecosystem of tools that can be used to capture and process traces.A second useful tool that is now available to developers using native-code generation is the ability to perform source-level debugging of eBPF programs. If the eBPF program is compiled with BTF data, the bpf2c tool will translate this in addition to the instructions and emit the appropriate pragmas containing the original file name and line numbers (with plans to extend this to allow the debugger to show eBPF local variables in the future). These are then consumed by the Windows Developer Kit tools and encoded into the final driver and symbol files, which the debugger can use to perform source-level debugging. In addition, these same symbol files can then be used by profiling tools to determine hot spots within eBPF programs and areas where performance could be improved.Learn moreThe introduction of support for a native image generation enhances eBPF For Windows in three areasA new mode of execution permits eBPF programs to be deployed on previously unsupported systems.A mechanism for offline verification and signing of eBPF programs.The ability for developers to perform source-level debugging of their eBPF programs.While support will continue for the existing JIT mode, this change gives developers and administrators flexibility in how programs are deployed. Separating the process of native image generation from the development of the eBPF program places the decision on how to deploy an eBPF program in the hands of the administrator and unburdens the developer from deployment time concerns.The post Towards debuggability and secure deployments of eBPF programs on Windows appeared first on Microsoft Open Source Blog.


.NET 8 Performance Improvements in .NET MAUI
.NET 8 Performance Improvements in .NET MAUI

The major focus for .NET MAUI in the .NET 8 release is quality. As such, alot of our focus has been fixing bugs instead of chasing lofty performance goals. In .NET 8, we merged 1,559 pull requests that closed 596 total issues. These include changes from the .NET MAUI team as well as the .NET MAUI community. We are optimistic that this should result in a significant increase in quality in .NET 8. However! We still have plenty of performance changes to showcase. Building upon the fundamental performance improvements in .NET 8 we discover “low-hanging” fruit constantly, and there were high-voted performance issues on GitHub we tried to tackle. Our goal is to continue to make .NET MAUI faster in each release, read on for details! For a review of the performance improvements in past releases, see our posts for .NET 6 and 7. This also gives you an idea of the improvements you would see migrating from Xamarin.Forms to .NET MAUI .NET 7 Performance Improvements in .NET MAUI .NET 6 Performance Improvements in .NET MAUI Table Of Contents New features AndroidStripILAfterAOT AndroidEnableMarshalMethods NativeAOT on iOS Build & Inner Loop Performance Filter Android ps -A output with grep Port WindowsAppSDK usage of vcmeta.dll to C# Improvements to remote iOS builds on Windows Improvements to Android inner-loop XAML Compilation no longer uses LoadInSeparateAppDomain Performance or App Size Improvements Structs and IEquatable in .NET MAUI Fix performance issue in {AppThemeBinding} Address CA1307 and CA1309 for performance Address CA1311 for performance Remove unused ViewAttachedToWindow event on Android Remove unneeded System.Reflection for {Binding} Use StringComparer.Ordinal for Dictionary and HashSet Reduce Java interop in MauiDrawable on Android Improve layout performance of Label on Android Reduce Java interop calls for controls in .NET MAUI Improve performance of Entry.MaxLength on Android Improve memory usage of CollectionView on Windows Use UnmanagedCallersOnlyAttribute on Apple platforms Faster Java interop for strings on Android Faster Java interop for C# events on Android Use Function Pointers for JNI Removed Xamarin.AndroidX.Legacy.Support.V4 Deduplication of generics on iOS and macOS Fix System.Linq.Expressions implementation on iOS-like platforms Set DynamicCodeSupport=false for iOS and Catalyst Memory Leaks Memory Leaks and Quality Diagnosing leaks in .NET MAUI Patterns that cause leaks C# events Circular references on Apple platforms Roslyn analyzer for Apple platforms Tooling and Documentation Simplified dotnet-trace and dotnet-dsrouter dotnet-gcdump Support for Mobile New Features AndroidStripILAfterAOT Once Upon A Time we had a brilliant thought if AOT pre-compiles C# methods, do we need the managed method anymore? Removing the C# method body would allow assemblies to be smaller. .NET iOS applications already do this, so why not Android as well? While the idea is straightforward, implementation was not iOS uses “Full” AOT, which AOT’s all methods into a form that doesn’t require a runtime JIT. This allowed iOS to run cil-strip, removing all method bodies from all managed types. At the time, Xamarin.Android only supported “normal” AOT, and normal AOT requires a JIT for certain constructs such as generic types and generic methods. This meant that attempting to run cil-strip would result in runtime errors if a method body was removed that was actually required at runtime. This was particularly bad because cil-strip could only remove all method bodies! We are re-intoducing IL stripping for .NET 8. Add a new $(AndroidStripILAfterAOT) MSBuild property. When true, the <MonoAOTCompiler/> task will track which method bodies were actually AOT’d, storing this information into %(_MonoAOTCompiledAssemblies.MethodTokenFile), and the new <ILStrip/> task will update the input assemblies, removing all method bodies that can be removed. By default enabling $(AndroidStripILAfterAOT) will override the default $(AndroidEnableProfiledAot) setting, allowing all trimmable AOT’d methods to be removed. This choice was made because $(AndroidStripILAfterAOT) is most useful when AOT-compiling your entire application. Profiled AOT and IL stripping can be used together by explicitly setting both within the .csproj, but with the only benefit being a small .apk size improvement <PropertyGroup Condition=" '$(Configuration)' == 'Release' "> <AndroidStripILAfterAOT>true</AndroidStripILAfterAOT> <AndroidEnableProfiledAot>true</AndroidEnableProfiledAot> </PropertyGroup> .apk size results for a dotnet new android app $(AndroidStripILAfterAOT) $(AndroidEnableProfiledAot) .apk size true true 7.7MB true false 8.1MB false true 7.7MB false false 8.4MB Note that AndroidStripILAfterAOT=false and AndroidEnableProfiledAot=true is the default Release configuration environment, for 7.7MB. A project that only sets AndroidStripILAfterAOT=true implicitly sets AndroidEnableProfiledAot=false, resulting in an 8.1MB app. See xamarin-android#8172 and dotnet/runtime#86722 for details about this feature. AndroidEnableMarshalMethods .NET 8 introduces a new experimental setting for Release configurations <PropertyGroup Condition=" '$(Configuration)' == 'Release' "> <AndroidEnableMarshalMethods>true</AndroidEnableMarshalMethods> <!-- Note that single-architecture apps will be most successful --> <RuntimeIdentifier>android-arm64</RuntimeIdentifier> </PropertyGroup> We hope to enable this feature by default in .NET 9, but for now we are providing the setting as an opt-in, experimental feature. Applications that only target one architecture, such as RuntimeIdentifier=android-arm64, will likely be able to enable this feature without issue. Background on Marshal Methods A JNI marshal method is a JNI-callable function pointer provided to JNIEnvRegisterNatives(). Currently, JNI marshal methods are provided via the interaction between code we generate and JNINativeWrapper.CreateDelegate() Our code-generator emits the “actual” JNI-callable method. JNINativeWrapper.CreateDelegate() uses System.Reflection.Emit to wrap the method for exception marshaling. JNI marshal methods are needed for all Java-to-C# transitions. Consider the virtual Activity.OnCreate() method partial class Activity { static Delegate? cb_onCreate_Landroid_os_Bundle_; static Delegate GetOnCreate_Landroid_os_Bundle_Handler () { if (cb_onCreate_Landroid_os_Bundle_ == null) cb_onCreate_Landroid_os_Bundle_ = JNINativeWrapper.CreateDelegate ((_JniMarshal_PPL_V) n_OnCreate_Landroid_os_Bundle_); return cb_onCreate_Landroid_os_Bundle_; } static void n_OnCreate_Landroid_os_Bundle_ (IntPtr jnienv, IntPtr native__this, IntPtr native_savedInstanceState) { var __this = globalJava.Lang.Object.GetObject<Android.App.Activity> (jnienv, native__this, JniHandleOwnership.DoNotTransfer)!; var savedInstanceState = globalJava.Lang.Object.GetObject<Android.OS.Bundle> (native_savedInstanceState, JniHandleOwnership.DoNotTransfer); __this.OnCreate (savedInstanceState); } // Metadata.xml XPath method reference path="/api/package[@name='android.app']/class[@name='Activity']/method[@name='onCreate' and count(parameter)=1 and parameter[1][@type='android.os.Bundle']]" [Register ("onCreate", "(Landroid/os/Bundle;)V", "GetOnCreate_Landroid_os_Bundle_Handler")] protected virtual unsafe void OnCreate (Android.OS.Bundle? savedInstanceState) => ... } Activity.n_OnCreate_Landroid_os_Bundle_() is the JNI marshal method, responsible for marshaling parameters from JNI values into C# types, forwarding the method invocation to Activity.OnCreate(), and (if necessary) marshaling the return value back to JNI. Activity.GetOnCreate_Landroid_os_Bundle_Handler() is part of the type registration infrastructure, providing a Delegate instance to RegisterNativeMembers .RegisterNativeMembers(), which is eventually passed to JNIEnvRegisterNatives(). While this works, it’s not incredibly performant unless using one of the optimized delegate types added in xamarin-android#6657, System.Reflection.Emit is used to create a wrapper around the marshal method, which is something we’ve wanted to avoid doing for years. Thus, the idea since we’re already bundling a native toolchain and using LLVM-IR to produce libxamarin-app.so, what if we emitted Java native method names and skipped all the done as part of Runtime.register() and JNIEnv.RegisterJniNatives()? Given class MyActivity Activity { protected override void OnCreate(Bundle? state) => ... } During the build, libxamarin-app.so would contain the function JNIEXPORT void JNICALL Java_crc..._MyActivity_n_1onCreate (JNIEnv *env, jobject self, jobject state); During App runtime, the Runtime.register() invocation present in Java Callable Wrappers would either be omitted or would be a no-op, and Android/JNI would instead resolve MyActivity.n_onCreate() as Java_crc..._MyActivity_n_1onCreate(). We call this effort “LLVM Marshal Methods”, which is currently experimental in .NET 8. Many of the specifics are still being investigated, and this feature will be spread across various areas. See xamarin-android#7351 for details about this experimental feature. NativeAOT on iOS In .NET 7, we started an experiment to see what it would take to support NativeAOT on iOS. Going from prototype to an initial implementation .NET 8 Preview 6 included NativeAOT as an experimental feature for iOS. To opt into NativeAOT in a MAUI iOS project, use the following settings in your project file <PropertyGroup Condition="$([MSBuild]GetTargetPlatformIdentifier('$(TargetFramework)')) == 'ios' and '$(Configuration)' == 'Release'"> <!-- PublishAot=true indicates NativeAOT, while omitting this property would use Mono's AOT --> <PublishAot>true</PublishAot> </PropertyGroup> Then to build the application for an iOS device $ dotnet publish -f net8.0-ios -r ios-arm64 MSBuild version 17.8.0+6cdef4241 for .NET ... Build succeeded. 0 Error(s) Note We may consider unifying and improving MSBuild property names for this feature in future .NET releases. To do a one-off build at the command-line you may also need to specify -pPublishAotUsingRuntimePack=true in addition to -pPublishAot=true. One of the main culprits for the first release was how the iOS workload supports Objective-C interoperability. The problem was mainly related to the type registration system which is the key component for efficiently supporting iOS-like platforms (see docs for details). In its implementation, the type registration system depends on type metadata tokens which are not available with NativeAOT. Therefore, in order to leverage the benefits of highly efficient NativeAOT runtime, we had to adapt. dotnet/runtime#80912 includes the discussion around how to tackle this problem, and finally in xamarin-macios#18268 we implemented a new managed static registrar that works with NativeAOT. The new managed static registrar does not just benefit us with being compatible with NativeAOT, but is also much faster than the default one, and is available for all supported runtimes (see docs for details). Along the way, we had a great help from our GH community and their contribution (code reviews, PRs) was essential to helps us move forward quickly and deliver this feature on time. A few from many PR’s that helped and unblocked us on our journey were dotnet/runtime#77956 dotnet/runtime#78280 dotnet/runtime#82317 dotnet/runtime#85996 and the list goes on… As .NET 8 Preview 6 came along, we finally managed to release our first version of the NativeAOT on iOS which also supports MAUI. See the blog post on .NET 8 Preview 6 for details about what we were able to accomplish in the initial release. In subsequent .NET 8 releases, results improved quite a bit, as we were identifying and resolving issues along the way. The graph below shows the .NET MAUI iOS template app size comparison throughout the preview releases We had steady progress and estimated size savings reported, due to fixing the following issues dotnet/runtime#87924 – fixed major NativeAOT size issue with AOT-incompatible code paths in System.Linq.Expressions and also made fully NativeAOT compatible when targeting iOS xamarin-macios#18332 – reduced the size of __LINKEDIT Export Info section in stripped binaries Furthermore, in the latest RC 1 release the app size went even further down reaching -50% smaller apps for the template .NET MAUI iOS applications compared to Mono. Most impactful issues/PRs that contributed to this xamarin-macios#18734 – Make Full the default link mode for NativeAOT xamarin-macios#18584 – Make the codebase trimming compatible through a series of PRs. Even though app size was our primary metric to focus on, for the RC 1 release, we also measured startup time performance comparisons for a .NET MAUI iOS template app comparing NativeAOT and Mono where NativeAOT results with almost 2x faster startup time. Key Takeaways For NativeAOT scenarios on iOS, changing the default link mode to Full (xamarin-macios#18734) is probably the biggest improvement for application size. But at the same time, this change can also break applications which are not fully AOT and trim-compatible. In Full link mode, the trimmer might trim away AOT incompatible code paths (think about reflection usage) which are accessed dynamically at runtime. Full link mode is not the default configuration when using the Mono runtime, so it is possible that some applications are not fully AOT-compatible. Supporting NativeAOT on iOS is an experimental feature and still a work-in-progress, and our plan is to address the potential issues with Full link mode incrementally As a first step, we enabled trim, AOT, and single-file warnings by default in xamarin-macios#18571. The enabled warnings should make our customers aware at build-time, whether a use of a certain framework or a library, or some C# constructs in their code, is incompatible with NativeAOT – and could crash at runtime. This information should guide our customers to write AOT-compatible code, but also to help us improve our frameworks and libraries with the same goal of fully utilising the benefits of AOT compilation. The second step, was clearing up all the warnings coming from Microsoft.iOS and System.Private.CoreLib assemblies reported for a template iOS application with xamarin-macios#18629 and dotnet/runtime#91520. In future releases, we plan to address the warnings coming from the MAUI framework and further improve the overall user-experience. Our goal is to have fully AOT and trim-compatible frameworks. .NET 8 will support targeting iOS platforms with NativeAOT as an opt-in feature and shows great potential by generating up to 50% smaller and 50% faster startup compared to Mono. Considering the great performance that NativeAOT promises, please help us on this journey and try out your applications with NativeAOT and report any potential issues. At the same time, let us know when NativeAOT “just works” out-of-the-box. To follow future progress, see dotnet/runtime#80905. Last but not least, we would like to thank our GH contributors, who are helping us make NativeAOT on iOS possible. Build & Inner Loop Performance Filter Android ps -A output with grep When profiling the Android inner loop for a .NET MAUI project with PerfView we found around 1.2% of CPU time was spent just trying to get the process ID of the running Android application. When changing Tools > Options > Xamarin > Xamarin Diagnostics output verbosity to be Diagnostics, you could see -- Start GetProcessId - 12/02/2022 110557 (96.9929ms) -- [INPUT] ps -A [OUTPUT] USER PID PPID VSZ RSS WCHAN ADDR S NAME root 1 0 10943736 4288 0 0 S init root 2 0 0 0 0 0 S [kthreadd] ... Hundreds of more lines! u0_a993 14500 1340 14910808 250404 0 0 R com.companyname.mauiapp42 -- End GetProcessId -- The Xamarin/.NET MAUI extension in Visual Studio polls every second to see if the application has exited. This is useful for changing the play/stop button state if you force close the app, etc. Testing on a Pixel 5, we could see the command is actually 762 lines of output! > (adb shell ps -A).Count 762 What we could do instead is something like > adb shell "ps -A | grep -w -E 'PID|com.companyname.mauiapp42'" Where we pipe the output of ps -A to the grep command on the Android device. Yes, Android has a subset of unix commands available! We filter on either a line containing PID or your application’s package name. The result is now the IDE is only parsing 4 lines [INPUT] ps -A | grep -w -E 'PID|com.companyname.mauiapp42' [OUTPUT] USER PID PPID VSZ RSS WCHAN ADDR S NAME u0_a993 12856 1340 15020476 272724 0 0 S com.companyname.mauiapp42 This not only improves memory used to split and parse this information in C#, but adb is also transmitting way less bytes across your USB cable or virtually from an emulator. This feature shipped in recent versions of Visual Studio 2022, improving this scenario for all Xamarin and .NET MAUI customers. Port WindowsAppSDK usage of vcmeta.dll to C# We found that every incremental build of a .NET MAUI project running on Windows spent time in Top 10 most expensive tasks CompileXaml = 3.972 s ... various tasks ... This is the XAML compiler for WindowsAppSDK, that compiles the WinUI3 flavor of XAML (not .NET MAUI XAML). There is very little XAML of this type in .NET MAUI projects, in fact, the only file is Platforms/Windows/App.xaml in the project template. Interestingly, if you installed the Desktop development with C++ workload in the Visual Studio installer, this time just completely went away! Top 10 most expensive tasks ... various tasks ... CompileXaml = 9 ms The WindowsAppSDK XAML compiler p/invokes into a native library from the C++ workload, vcmeta.dll, to calculate a hash for .NET assembly files. This is used to make incremental builds fast — if the hash changes, compile the XAML again. If vcmeta.dll was not found on disk, the XAML compiler was effectively “recompiling everything” on every incremental build. For an initial fix, we simply included a small part of the C++ workload as a dependency of .NET MAUI in Visual Studio. The slightly larger install size was a good tradeoff for saving upwards of 4 seconds in incremental build time. Next, we implemented vcmeta.dll‘s hashing functionality in plain C# with System.Reflection.Metadata to compute indentical hash values as before. Not only was this implementation better, in that we could drop a dependency on the C++ workload, but it was also faster! The time to compute a single hash Method Mean Error StdDev Native 217.31 us 1.704 us 1.594 us Managed 86.43 us 1.700 us 2.210 us Some of the reasons this was faster No p/invoke or COM-interfaces involved. System.Reflection.Metadata has a fast struct-based API, perfect for iterating over types in a .NET assembly and computing a hash value. The end result being that CompileXaml might actually be even faster than 9ms in incremental builds. This feature shipped in WindowsAppSDK 1.3, which is now used by .NET MAUI in .NET 8. See WindowsAppSDK#3128 for details about this improvement. Improvements to remote iOS builds on Windows Comparing inner loop performance for iOS, there was a considerable gap between doing “remote iOS” development on Windows versus doing everything locally on macOS. Many small improvements were made, based on comparing inner-loop .binlog files recorded on macOS versus one recorded inside Visual Studio on Windows. Some examples include maui#12747 don’t explicitly copy files to the build server xamarin-macios#16752 do not copy files to build server for a Delete operation xamarin-macios#16929 batch file deletion via DeleteFilesAsync xamarin-macios#17033 cache AOT compiler path Xamarin/MAUI Visual Studio extension when running dotnet-install.sh on remote build hosts, set the explicit processor flag for M1 Macs. We also made some improvements for all iOS & MacCatalyst projects, such as xamarin-macios#16416 don’t process assemblies over and over again Improvements to Android inner-loop We also made many small improvements to the “inner-loop” on Android — most of which were focused in a specific area. Previously, Xamarin.Forms projects had the luxury of being organized into multiple projects, such as YourApp.Android.csproj Xamarin.Android application project YourApp.iOS.csproj Xamarin.iOS application project YourApp.csproj netstandard2.0 class library Where almost all of the logic for a Xamarin.Forms app was contained in the netstandard2.0 project. Nearly all the incremental builds would be changes to XAML or C# in the class library. This structure enabled the Xamarin.Android MSBuild targets to completely skip many Android-specific MSBuild steps. In .NET MAUI, the “single project” feature means that every incremental build has to run these Android-specific build steps. In focusing specifically improving this area, we made many small changes, such as java-interop#1061 avoid string.Format() java-interop#1064 improve ToJniNameFromAttributesForAndroid java-interop#1065 avoid File.Exists() checks java-interop#1069 fix more places to use TypeDefinitionCache java-interop#1072 use less System.Linq for custom attributes java-interop#1103 use MemoryMappedFile when using Mono.Cecil xamarin-android#7621 avoid File.Exists() checks xamarin-android#7626 perf improvements for LlvmIrGenerator xamarin-android#7652 fast path for <CheckClientHandlerType/> xamarin-android#7653 delay ToJniName when generating AndroidManifest.xml xamarin-android#7686 lazily populate Resource lookup These changes should improve incremental builds in all .NET 8 Android project types. XAML Compilation no longer uses LoadInSeparateAppDomain Looking at the JITStats report in PerfView (for MSBuild.exe) Name JitTime (ms) Microsoft.Maui.Controls.Build.Tasks.dll 214.0 Mono.Cecil 119.0 It appears that Microsoft.Maui.Controls.Build.Tasks.dll was spending a lot of time in the JIT. What was confusing, is this was an incremental build where everything should already be loaded. The JIT’s work should be done already? The cause appears to be usage of the [LoadInSeparateAppDomain] attribute defined by the <XamlCTask/> in .NET MAUI. This is an MSBuild feature that gives MSBuild tasks to run in an isolated AppDomain — with an obvious performance drawback. However, we couldn’t just remove it as there would be complications… [LoadInSeparateAppDomain] also conveniently resets all static state when <XamlCTask/> runs again. Meaning that future incremental builds would potentially use old (garbage) values. There are several places that cache Mono.Cecil objects for performance reasons. Really weird bugs would result if we didn’t address this. So, to actually make this change, we reworked all static state in the XAML compiler to be stored in instance fields & properties instead. This is a general software design improvement, in addition to giving us the ability to safely remove [LoadInSeparateAppDomain]. The results of this change, for an incremental build on a Windows PC Before XamlCTask = 743 ms XamlCTask = 706 ms XamlCTask = 692 ms After XamlCTask = 128 ms XamlCTask = 134 ms XamlCTask = 117 ms This saved about ~587ms on incremental builds on all platforms, an 82% improvement. This will help even more on large solutions with multiple .NET MAUI projects, where <XamlCTask/> runs multiple times. See maui#11982 for further details about this improvement. Performance or App Size Improvements Structs and IEquatable in .NET MAUI Using the Visual Studio’s .NET Object Allocation Tracking profiler on a customer .NET MAUI sample application, we saw Microsoft.Maui.WeakEventManager+Subscription Allocations 686,114 Bytes 21,955,648 This seemed like an exorbitant amount of memory to be used in a sample application’s startup! Drilling in to see where these struct‘s were being created System.Collections.Generic.ObjectEqualityComparer<Microsoft.Maui.WeakEventManager+Subscription>.IndexOf() The underlying problem was this struct didn’t implement IEquatable<T> and was being used as the key for a dictionary. The CA1815 code analysis rule was designed to catch this problem. This is not a rule that is enabled by default, so projects must opt into it. To solve this Subscription is internal to .NET MAUI, and its usage made it possible to be a readonly struct. This was just an extra improvement. We made CA1815 a build error across the entire dotnet/maui repository. We implemented IEquatable<T> for all struct types. After these changes, we could no longer found Microsoft.Maui.WeakEventManager+Subscription in memory snapshots at all. Which saved ~21 MB of allocations in this sample application. If your own projects have usage of struct, it seems quite worthwhile to make CA1815 a build error. A smaller, targeted version of this change was backported to MAUI in .NET 7. See maui#13232 for details about this improvement. Fix performance issue in {AppThemeBinding} Profiling a .NET MAUI sample application from a customer, we noticed a lot of time spent in {AppThemeBinding} and WeakEventManager while scrolling 2.08s (17%) microsoft.maui.controls!Microsoft.Maui.Controls.AppThemeBinding.Apply(object,Microsoft.Maui.Controls.BindableObject,Micr... 2.05s (16%) microsoft.maui.controls!Microsoft.Maui.Controls.AppThemeBinding.AttachEvents() 2.04s (16%) microsoft.maui!Microsoft.Maui.WeakEventManager.RemoveEventHandler(System.EventHandler`1<TEventArgs_REF>,string) The following was happening in this application The standard .NET MAUI project template has lots of {AppThemeBinding} in the default Styles.xaml. This supports Light vs Dark theming. {AppThemeBinding} subscribes to Application.RequestedThemeChanged So, every MAUI view subscribe to this event — potentially multiple times. Subscribers are a Dictionary<string, List<Subscriber>>, where there is a dictionary lookup followed by a O(N) search for unsubscribe operations. There is potentially a usecase here to come up with a generalized “weak event” pattern for .NET. The implementation currently in .NET MAUI came over from Xamarin.Forms, but a generalized pattern could be useful for .NET developers using other UI frameworks. To make this scenario fast, for now, in .NET 8 Before For any {AppThemeBinding}, it calls both RequestedThemeChanged -= OnRequestedThemeChanged O(N) time RequestedThemeChanged += OnRequestedThemeChanged constant time Where the -= is notably slower, due to possibly 100s of subscribers. After Create an _attached boolean, so we know know the “state” if it is attached or not. New bindings only call +=, where -= will now only be called by {AppThemeBinding} in rare cases. Most .NET MAUI apps do not “unapply” bindings, but -= would only be used in that case. See the full details about this fix in maui#14625. See dotnet/runtime#61517 for how we could implement “weak events” in .NET in the future. Address CA1307 and CA1309 for performance Profiling a .NET MAUI sample application from a customer, we noticed time spent during “culture-aware” string operations 77.22ms microsoft.maui!Microsoft.Maui.Graphics.MauiDrawable.SetDefaultBackgroundColor() 42.55ms System.Private.CoreLib!System.String.ToLower() This case, we can improve by simply calling ToLowerInvariant() instead. In some cases you might even consider using string.Equals() with StringComparer.Ordinal. In this case, our code was further reviewed and optimized in Reduce Java interop in MauiDrawable on Android. In .NET 7, we added CA1307 and CA1309 code analysis rules to catch cases like this, but it appears we missed some in Microsoft.Maui.Graphics.dll. These are likely useful rules to enable in your own .NET MAUI applications, as avoiding all culture-aware string operations can be quite impactful on mobile. See maui#14627 for details about this improvement. Address CA1311 for performance After addressing the CA1307 and CA1309 code analysis rules, we took things further and addressed CA1311. As mentioned in the turkish example, doing something like string text = something.ToUpper(); switch (text) { ... } Can actually cause unexpected behavior in Turkish locales, because in Turkish, the character I (Unicode 0049) is considered the upper case version of a different character ý (Unicode 0131), and i (Unicode 0069) is considered the lower case version of yet another character Ý (Unicode 0130). ToLowerInvariant() and ToUpperInvariant() are also better for performance as an invariant ToLower / ToUpper operation is slightly faster. Doing this also avoids loading the current culture, improving startup performance. There are cases where you would want the current culture, such as in a CaseConverter type in .NET MAUI. To do this, you simply have to be explicit in which culture you want to use return ConvertToUpper ? v.ToUpper(CultureInfo.CurrentCulture) v.ToLower(CultureInfo.CurrentCulture); The goal of this CaseConverter is to display upper or lowercase text to a user. So it makes sense to use the CurrentCulture for this. See maui#14773 for details about this improvement. Remove unused ViewAttachedToWindow event on Android Every Label in .NET MAUI was subscribing to public class MauiTextView AppCompatTextView { public MauiTextView(Context context) base(context) { this.ViewAttachedToWindow += MauiTextView_ViewAttachedToWindow; } private void MauiTextView_ViewAttachedToWindow(object? sender, ViewAttachedToWindowEventArgs e) { } //... This was leftover from refactoring, but appeared in dotnet-trace output as 278.55ms (2.4%) mono.android!Android.Views.View.add_ViewAttachedToWindow(System.EventHandler`1<Android.Views.View/ViewAttachedToWindowEv 30.55ms (0.26%) mono.android!Android.Views.View.IOnAttachStateChangeListenerInvoker.n_OnViewAttachedToWindow_Landroid_view_View__mm_wra Where the first is the subscription, and the second is the event firing from Java to C# — only to run an empty managed method. Simply removing this event subscription and empty method, resulted in only a few controls to subscribe to this event as needed 2.76ms (0.02%) mono.android!Android.Views.View.add_ViewAttachedToWindow(System.EventHandler`1<Android.Views.View/ViewAttachedToWindowEv See maui#14833 for details about this improvement. Remove unneeded System.Reflection for {Binding} All bindings in .NET MAUI commonly hit the code path if (property.CanWrite && property.SetMethod.IsPublic && !property.SetMethod.IsStatic) { part.LastSetter = property.SetMethod; var lastSetterParameters = part.LastSetter.GetParameters(); part.SetterType = lastSetterParameters[lastSetterParameters.Length - 1].ParameterType; //... Where ~53% of the time spent applying a binding appeared in dotnet-trace in the MethodInfo.GetParameters() method core.benchmarks!Microsoft.Maui.Benchmarks.BindingBenchmarker.BindName() ... microsoft.maui.controls!Microsoft.Maui.Controls.BindingExpression.SetupPart() System.Private.CoreLib.il!System.Reflection.RuntimeMethodInfo.GetParameters() The above C# is simply finding the property type. It is using a roundabout way of using the property setter’s first parameter, which can be simplified to part.SetterType = property.PropertyType; We could see the results of this change in a BenchmarkDotNet benchmark Method Mean Error StdDev Gen0 Gen1 Allocated –BindName 18.82 us 0.336 us 0.471 us 1.2817 1.2512 10.55 KB ++BindName 18.80 us 0.371 us 0.555 us 1.2512 1.2207 10.23 KB –BindChild 27.47 us 0.542 us 0.827 us 2.0142 1.9836 16.56 KB ++BindChild 26.71 us 0.516 us 0.652 us 1.9226 1.8921 15.94 KB –BindChildIndexer 58.39 us 1.113 us 1.143 us 3.1738 3.1128 26.17 KB ++BindChildIndexer 58.00 us 1.055 us 1.295 us 3.1128 3.0518 25.47 KB Where ++ denotes the new changes. See maui#14830 for further details about this improvement. Use StringComparer.Ordinal for Dictionary and HashSet Profiling a .NET MAUI sample application from a customer, we noticed 4% of the time while scrolling was spent doing dictionary lookups (4.0%) System.Private.CoreLib!System.Collections.Generic.Dictionary<TKey_REF,TValue_REF>.FindValue(TKey_REF) Observing the call stack, some of these were coming from culture-aware string lookups in .NET MAUI microsoft.maui!Microsoft.Maui.PropertyMapper.GetProperty(string) microsoft.maui!Microsoft.Maui.WeakEventManager.AddEventHandler(System.EventHandler<TEventArgs_REF>,string) microsoft.maui!Microsoft.Maui.CommandMapper.GetCommand(string) Which show up in dotnet-trace as a mixture of string comparers (0.98%) System.Private.CoreLib!System.Collections.Generic.NonRandomizedStringEqualityComparer.OrdinalComparer.GetHashCode(string) (0.71%) System.Private.CoreLib!System.String.GetNonRandomizedHashCode() (0.31%) System.Private.CoreLib!System.Collections.Generic.NonRandomizedStringEqualityComparer.OrdinalComparer.Equals(string,stri (0.01%) System.Private.CoreLib!System.Collections.Generic.NonRandomizedStringEqualityComparer.GetStringComparer(object) In cases of Dictionary<string, TValue> or HashSet<string>, we can use StringComparer.Ordinal in many cases to get faster dictionary lookups. This should slightly improve the performance of handlers & all .NET MAUI controls on all platforms. See maui#14900 for details about this improvement. Reduce Java interop in MauiDrawable on Android Profiling a .NET MAUI customer sample while scrolling on a Pixel 5, we saw some interesting time being spent in (0.76%) microsoft.maui!Microsoft.Maui.Graphics.MauiDrawable.OnDraw(Android.Graphics.Drawables.Shapes.Shape,Android.Graphics.Canv (0.54%) microsoft.maui!Microsoft.Maui.Graphics.MauiDrawable.SetDefaultBackgroundColor() This sample has a <Border/> inside a <CollectionView/> and so you can see this work happening while scrolling. Specifically, we reviewed code in .NET MAUI, such as _borderPaint.StrokeWidth = _strokeThickness; _borderPaint.StrokeJoin = _strokeLineJoin; _borderPaint.StrokeCap = _strokeLineCap; _borderPaint.StrokeMiter = _strokeMiterLimit * 2; if (_borderPathEffect != null) _borderPaint.SetPathEffect(_borderPathEffect); This calls from C# to Java five times. Creating a new method in PlatformInterop.java allowed us to reduce it to a single time. We also improved the following method, which would perform many calls from C# to Java // C# void SetDefaultBackgroundColor() { using (var background = new TypedValue()) { if (_context == null || _context.Theme == null || _context.Resources == null) return; if (_context.Theme.ResolveAttribute(globalAndroid.Resource.Attribute.WindowBackground, background, true)) { var resource = _context.Resources.GetResourceTypeName(background.ResourceId); var type = resource?.ToLowerInvariant(); if (type == "color") { var color = new Android.Graphics.Color(ContextCompat.GetColor(_context, background.ResourceId)); _backgroundColor = color; } } } } To be more succinctly implemented in Java as // Java /** * Gets the value of android.R.attr.windowBackground from the given Context * @param context * @return the color or -1 if not found */ public static int getWindowBackgroundColor(Context context) { TypedValue value = new TypedValue(); if (!context.getTheme().resolveAttribute(android.R.attr.windowBackground, value, true) && isColorType(value)) { return value.data; } else { return -1; } } /** * Needed because TypedValue.isColorType() is only API Q+ * https//github.com/aosp-mirror/platform_frameworks_base/blob/1d896eeeb8744a1498128d62c09a3aa0a2a29a16/core/java/android/util/TypedValue.java#L266-L268 * @param value * @return true if the TypedValue is a Color */ private static boolean isColorType(TypedValue value) { if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.Q) { return value.isColorType(); } else { // Implementation from AOSP return (value.type >= TypedValue.TYPE_FIRST_COLOR_INT && value.type <= TypedValue.TYPE_LAST_COLOR_INT); } } Which reduces our new implementation on the C# side to be a single Java call and creation of an Android.Graphics.Color struct void SetDefaultBackgroundColor() { var color = PlatformInterop.GetWindowBackgroundColor(_context); if (color != -1) { _backgroundColor = new Android.Graphics.Color(color); } } After these changes, we instead saw dotnet-trace output, such as (0.28%) microsoft.maui!Microsoft.Maui.Graphics.MauiDrawable.OnDraw(Android.Graphics.Drawables.Shapes.Shape,Android.Graphics.Canv (0.04%) microsoft.maui!Microsoft.Maui.Graphics.MauiDrawable.SetDefaultBackgroundColor() This improves the performance of any <Border/> (and other shapes) on Android, and drops about ~1% of the CPU usage while scrolling in this example. See maui#14933 for further details about this improvement. Improve layout performance of Label on Android Testing various .NET MAUI sample applications on Android, we noticed around 5.1% of time spent in PrepareForTextViewArrange() 1.01s (5.1%) microsoft.maui!Microsoft.Maui.ViewHandlerExtensions.PrepareForTextViewArrange(Microsoft.Maui.IViewHandler,Microsoft.Maui 635.99ms (3.2%) mono.android!Android.Views.View.get_Context() Most of the time is spent just calling Android.Views.View.Context to be able to then call into the extension method internal static int MakeMeasureSpecExact(this Context context, double size) { // Convert to a native size to create the spec for measuring var deviceSize = (int)context!.ToPixels(size); return MeasureSpecMode.Exactly.MakeMeasureSpec(deviceSize); } Calling the Context property can be expensive due the interop from C# to Java. Java returns a handle to the instance, then we have to look up any existing, managed C# objects for the Context. If all this work can simply be avoided, it can improve performance dramatically. In .NET 7, we made overloads to ToPixels() that allows you to get the same value with an Android.Views.View So we can instead do internal static int MakeMeasureSpecExact(this PlatformView view, double size) { // Convert to a native size to create the spec for measuring var deviceSize = (int)view.ToPixels(size); return MeasureSpecMode.Exactly.MakeMeasureSpec(deviceSize); } Not only did this change show improvements in dotnet-trace output, but we saw a noticeable difference in our LOLs per second test application from last year See maui#14980 for details about this improvement. Reduce Java interop calls for controls in .NET MAUI Reviewing the beautiful .NET MAUI “Surfing App” sample by @jsuarezruiz We noticed that a lot of time is spent doing Java interop while scrolling 1.76s (35%) Microsoft.Maui!Microsoft.Maui.Platform.WrapperView.DispatchDraw(Android.Graphics.Canvas) 1.76s (35%) Microsoft.Maui!Microsoft.Maui.Platform.ContentViewGroup.DispatchDraw(Android.Graphics.Canvas) These methods were deeply nested doing interop from Java -> C# -> Java many levels deep. In this case, moving some code from C# to Java could make it where less interop would occur; and in some cases no interop at all! So for example, previously DispatchDraw() was overridden in C# to implement clipping behavior // C# // ContentViewGroup is used internally by many .NET MAUI Controls class ContentViewGroup Android.Views.ViewGroup { protected override void DispatchDraw(Canvas? canvas) { if (Clip != null) ClipChild(canvas); base.DispatchDraw(canvas); } } By creating a PlatformContentViewGroup.java, we can do something like // Java /** * Set by C#, determining if we need to call getClipPath() * @param hasClip */ protected final void setHasClip(boolean hasClip) { this.hasClip = hasClip; postInvalidate(); } @Override protected void dispatchDraw(Canvas canvas) { // Only call into C# if there is a Clip if (hasClip) { Path path = getClipPath(canvas.getWidth(), canvas.getHeight()); if (path != null) { canvas.clipPath(path); } } super.dispatchDraw(canvas); } setHasClip() is called when clipping is enabled/disabled on any .NET MAUI control. This allowed the common path to not interop into C# at all, and only views that have opted into clipping would need to. This is very good because dispatchDraw() is called quite often during Android layout, scrolling, etc. This same treatment was also done to a few other internal .NET MAUI types like WrapperView improving the common case, making interop only occur when views have opted into clipping or drop shadows. For testing the impact of these changes, we used Google’s FrameMetricsAggregator that can be setup in any .NET MAUI application’s Platforms/Android/MainActivity.cs // How often in ms you'd like to print the statistics to the console const int Duration = 1000; FrameMetricsAggregator aggregator; Handler handler; protected override void OnCreate(Bundle savedInstanceState) { base.OnCreate(savedInstanceState); handler = new Handler(Looper.MainLooper); // We were interested in the "Total" time, other metrics also available aggregator = new FrameMetricsAggregator(FrameMetricsAggregator.TotalDuration); aggregator.Add(this); handler.PostDelayed(OnFrame, Duration); } void OnFrame() { // We were interested in the "Total" time, other metrics also available var metrics = aggregator.GetMetrics()[FrameMetricsAggregator.TotalIndex]; int size = metrics.Size(); double sum = 0, count = 0, slow = 0; for (int i = 0; i < size; i++) { int value = metrics.Get(i); if (value != 0) { count += value; sum += i * value; if (i > 16) slow += value; Console.WriteLine($"Frame(s) that took ~{i}ms, count {value}"); } } if (sum > 0) { Console.WriteLine($"Average frame time {sum / count0.00}ms"); Console.WriteLine($"No. of slow frames {slow}"); Console.WriteLine("-----"); } handler.PostDelayed(OnFrame, Duration); } FrameMetricsAggregator‘s API is admittedly a bit odd, but the data we get out is quite useful. The result is basically a lookup table where the key is a duration in milliseconds, and the value is the number of “frames” that took that duration. The idea is any frame that takes longer than 16ms is considered “slow” or “janky” as the Android docs sometimes refer. An example of the .NET MAUI “Surfing App” running on a Pixel 5 Before Frame(s) that took ~4ms, count 1 Frame(s) that took ~5ms, count 6 Frame(s) that took ~6ms, count 10 Frame(s) that took ~7ms, count 12 Frame(s) that took ~8ms, count 10 Frame(s) that took ~9ms, count 6 Frame(s) that took ~10ms, count 1 Frame(s) that took ~11ms, count 2 Frame(s) that took ~12ms, count 4 Frame(s) that took ~13ms, count 2 Frame(s) that took ~15ms, count 1 Frame(s) that took ~16ms, count 1 Frame(s) that took ~18ms, count 2 Frame(s) that took ~19ms, count 1 Frame(s) that took ~20ms, count 5 Frame(s) that took ~21ms, count 2 Frame(s) that took ~22ms, count 1 Frame(s) that took ~25ms, count 1 Frame(s) that took ~32ms, count 1 Frame(s) that took ~34ms, count 1 Frame(s) that took ~60ms, count 1 Frame(s) that took ~62ms, count 1 Frame(s) that took ~63ms, count 1 Frame(s) that took ~64ms, count 2 Frame(s) that took ~66ms, count 1 Frame(s) that took ~67ms, count 1 Frame(s) that took ~68ms, count 1 Frame(s) that took ~69ms, count 2 Frame(s) that took ~70ms, count 2 Frame(s) that took ~71ms, count 2 Frame(s) that took ~72ms, count 1 Frame(s) that took ~73ms, count 2 Frame(s) that took ~74ms, count 2 Frame(s) that took ~75ms, count 1 Frame(s) that took ~76ms, count 1 Frame(s) that took ~77ms, count 2 Frame(s) that took ~78ms, count 3 Frame(s) that took ~79ms, count 1 Frame(s) that took ~80ms, count 1 Frame(s) that took ~81ms, count 1 Average frame time 28.67ms No. of slow frames 43 After the changes to ContentViewGroup and WrapperView were in place, we got a very nice improvement! Even in an app making heavy usage of clipping and shadows After Frame(s) that took ~5ms, count 3 Frame(s) that took ~6ms, count 5 Frame(s) that took ~7ms, count 7 Frame(s) that took ~8ms, count 7 Frame(s) that took ~9ms, count 4 Frame(s) that took ~10ms, count 2 Frame(s) that took ~11ms, count 6 Frame(s) that took ~12ms, count 2 Frame(s) that took ~13ms, count 3 Frame(s) that took ~14ms, count 4 Frame(s) that took ~15ms, count 1 Frame(s) that took ~16ms, count 1 Frame(s) that took ~17ms, count 1 Frame(s) that took ~18ms, count 2 Frame(s) that took ~19ms, count 1 Frame(s) that took ~20ms, count 3 Frame(s) that took ~21ms, count 2 Frame(s) that took ~22ms, count 2 Frame(s) that took ~27ms, count 2 Frame(s) that took ~29ms, count 2 Frame(s) that took ~32ms, count 1 Frame(s) that took ~34ms, count 1 Frame(s) that took ~35ms, count 1 Frame(s) that took ~64ms, count 1 Frame(s) that took ~67ms, count 1 Frame(s) that took ~68ms, count 2 Frame(s) that took ~69ms, count 1 Frame(s) that took ~72ms, count 3 Frame(s) that took ~74ms, count 3 Average frame time 21.99ms No. of slow frames 29 See maui#14275 for further detail about these changes. Improve performance of Entry.MaxLength on Android Investigating a .NET MAUI customer sample Navigating from a Shell flyout. To a new page with several Entry controls. There was a noticeable performance delay. When profiling on a Pixel 5, one “hot path” was Entry.MaxLength 18.52ms (0.22%) microsoft.maui!Microsoft.Maui.Platform.EditTextExtensions.UpdateMaxLength(Android.Widget.EditText,Microsoft.Maui.IEntry) 16.03ms (0.19%) microsoft.maui!Microsoft.Maui.Platform.EditTextExtensions.UpdateMaxLength(Android.Widget.EditText,int) 12.16ms (0.14%) microsoft.maui!Microsoft.Maui.Platform.EditTextExtensions.SetLengthFilter(Android.Widget.EditText,int) EditTextExtensions.UpdateMaxLength() calls EditText.Text getter and setter EditTextExtensions.SetLengthFilter() calls EditText.Get/SetFilters() What happens is we end up marshaling strings and IInputFilter[] back and forth between C# and Java for every Entry control. All Entry controls go through this code path (even ones with a default value for MaxLength), so it made sense to move some of this code from C# to Java instead. Our C# code before // C# public static void UpdateMaxLength(this EditText editText, int maxLength) { editText.SetLengthFilter(maxLength); var newText = editText.Text.TrimToMaxLength(maxLength); if (editText.Text != newText) editText.Text = newText; } public static void SetLengthFilter(this EditText editText, int maxLength) { if (maxLength == -1) maxLength = int.MaxValue; var currentFilters = new List<IInputFilter>(editText.GetFilters() ?? new IInputFilter[0]); var changed = false; for (var i = 0; i < currentFilters.Count; i++) { if (currentFilters[i] is InputFilterLengthFilter) { currentFilters.RemoveAt(i); changed = true; break; } } if (maxLength >= 0) { currentFilters.Add(new InputFilterLengthFilter(maxLength)); changed = true; } if (changed) editText.SetFilters(currentFilters.ToArray()); } Moved to Java (with identical behavior) instead // Java /** * Sets the maxLength of an EditText * @param editText * @param maxLength */ public static void updateMaxLength(@NonNull EditText editText, int maxLength) { setLengthFilter(editText, maxLength); if (maxLength < 0) return; Editable currentText = editText.getText(); if (currentText.length() > maxLength) { editText.setText(currentText.subSequence(0, maxLength)); } } /** * Updates the InputFilter[] of an EditText. Used for Entry and SearchBar. * @param editText * @param maxLength */ public static void setLengthFilter(@NonNull EditText editText, int maxLength) { if (maxLength == -1) maxLength = Integer.MAX_VALUE; List<InputFilter> currentFilters = new ArrayList<>(Arrays.asList(editText.getFilters())); boolean changed = false; for (int i = 0; i < currentFilters.size(); i++) { InputFilter filter = currentFilters.get(i); if (filter instanceof InputFilter.LengthFilter) { currentFilters.remove(i); changed = true; break; } } if (maxLength >= 0) { currentFilters.add(new InputFilter.LengthFilter(maxLength)); changed = true; } if (changed) { InputFilter[] newFilter = new InputFilter[currentFilters.size()]; editText.setFilters(currentFilters.toArray(newFilter)); } } This avoids marshaling (copying!) string and array values back and forth from C# to Java. With these changes in place, the calls to EditTextExtensions.UpdateMaxLength() are now so fast they are missing completely from dotnet-trace output, saving ~19ms when navigating to the page in the customer sample. See maui#15614 for details about this improvement. Improve memory usage of CollectionView on Windows We reviewed a .NET MAUI customer sample with a CollectionView of 150,000 data-bound rows. Debugging what happens at runtime, .NET MAUI was effectively doing _itemTemplateContexts = new List<ItemTemplateContext>(capacity 150_000); for (int n = 0; n < 150_000; n++) { _itemTemplateContexts.Add(null); } And then each item is created as it is scrolled into view if (_itemTemplateContexts[index] == null) { _itemTemplateContexts[index] = context = new ItemTemplateContext(...); } return _itemTemplateContexts[index]; This wasn’t the best approach, but to improve things use a Dictionary<int, T> instead, just let it size dynamically. use TryGetValue(..., out var context), so each call accesses the indexer one less time than before. use either the bound collection’s size or 64 (whichever is smaller) as a rough estimate of how many might fit on screen at a time Our code changes to if (!_itemTemplateContexts.TryGetValue(index, out var context)) { _itemTemplateContexts[index] = context = new ItemTemplateContext(...); } return context; With these changes in place, a memory snapshot of the app after startup Before Heap Size 82,899.54 KB After Heap Size 81,768.76 KB Which is saving about 1MB of memory on launch. In this case, it feels better to just let the Dictionary size itself with an estimate of what capacity will be. See maui#16838 for details about this improvement. Use UnmanagedCallersOnlyAttribute on Apple platforms When unmanaged code calls into managed code, such as invoking a callback from Objective-C, the [MonoPInvokeCallbackAttribute] was previously used in Xamarin.iOS, Xamarin.Mac, and .NET 6+ for this purpose. The [UnmanagedCallersOnlyAttribute] attribute came along as a modern replacement for this Mono feature, which is implemented in a way with performance in mind. Unfortunately, there are a few restrictions when using this new attribute Method must be marked static. Must not be called from managed code. Must only have blittable arguments. Must not have generic type parameters or be contained within a generic class. Not only did we have to refactor the “code generator” that produces many of the bindings for Apple APIs for AppKit, UIKit, etc., but we also had many manual bindings that would need the same treatment. The end result is that most callbacks from Objective-C to C# should be faster in .NET 8 than before. See xamarin-macios#10470 and xamarin-macios#15783 for details about these improvements. Faster Java interop for strings on Android When binding members which have parameter types or return types which are java.lang.CharSequence, the member is “overloaded” to replace CharSequence with System.String, and the “original” member has a Formatted suffix. For example, consider android.widget.TextView, which has getText() and setText() methods which have parameter types and return types which are java.lang.CharSequence // Java class TextView extends View { public CharSequence getText(); public final void setText(CharSequence text); } When bound, this results in two properties // C# class TextView View { public Java.Lang.ICharSequence? TextFormatted { get; set; } public string? Text { get; set; } } The “non-Formatted overload” works by creating a temporary String object to invoke the Formatted overload, so the actual implementation looks like partial class TextView { public string? Text { get => TextFormatted?.ToString (); set { var jls = value == null ? null new Java.Lang.String (value); TextFormatted = jls; jls?.Dispose (); } } } TextView.Text is much easer to understand & simpler to consume for .NET developers than TextView.TextFormatted. A problem with the this approach is performance creating a new Java.Lang.String instance requires Creating the managed peer (the Java.Lang.String instance), Creating the native peer (the java.lang.String instance), And registering the mapping between (1) and (2) And then immediately use and dispose the value… This is particularly noticeable with .NET MAUI apps. Consider a customer sample, which uses XAML to set data-bound Text values in a CollectionView, which eventually hit TextView.Text. Profiling shows 653.69ms (6.3%) mono.android!Android.Widget.TextView.set_Text(string) 198.05ms (1.9%) mono.android!Java.Lang.String..ctor(string) 121.57ms (1.2%) mono.android!Java.Lang.Object.Dispose() 6.3% of scrolling time is spent in the TextView.Text property setter! Partially optimize this case if the *Formatted member is (1) a property, and (2) not virtual, then we can directly call the Java setter method. This avoids the need to create a managed peer and to register a mapping between the peers partial class TextView { public string? Text { get => TextFormatted?.ToString (); // unchanged set { const string __id = "setText.(Ljava/lang/CharSequence;)V"; JniObjectReference native_value = JniEnvironment.Strings.NewString (value); try { JniArgumentValue* __args = stackalloc JniArgumentValue [1]; __args [0] = new JniArgumentValue (native_value); _members.InstanceMethods.InvokeNonvirtualVoidMethod (__id, this, __args); } finally { JniObjectReference.Dispose (ref native_value); } } } } With the result being Method Mean Error StdDev Allocated Before SetFinalText 6.632 us 0.0101 us 0.0079 us 112 B After SetFinalText 1.361 us 0.0022 us 0.0019 us – The TextView.Text property setter invocation time is reduced to 20% of the previous average invocation time. Note that the virtual case is problematic for other reasons, but luckily enough TextView.setText() is non-virtual and likely one of the more commonly used Android APIs. See java-interop#1101 for details about this improvement. Faster Java interop for C# events on Android Profiling a .NET MAUI customer sample while scrolling on a Pixel 5, We saw ~2.2% of the time spent in the IOnFocusChangeListenerImplementor constructor, due to a subscription to the View.FocusChange event (2.2%) mono.android!Android.Views.View.IOnFocusChangeListenerImplementor..ctor() MAUI subscribes to Android.Views.View.FocusChange for every view placed on the screen, which happens while scrolling in this sample. Reviewing the generated code for the IOnFocusChangeListenerImplementor constructor, we see it still uses outdated JNIEnv APIs public IOnFocusChangeListenerImplementor () base ( Android.Runtime.JNIEnv.StartCreateInstance ("mono/android/view/View_OnFocusChangeListenerImplementor", "()V"), JniHandleOwnership.TransferLocalRef ) { Android.Runtime.JNIEnv.FinishCreateInstance (((Java.Lang.Object) this).Handle, "()V"); } Which we can change to use the newer/faster Java.Interop APIs public unsafe IOnFocusChangeListenerImplementor () base (IntPtr.Zero, JniHandleOwnership.DoNotTransfer) { const string __id = "()V"; if (((Java.Lang.Object) this).Handle != IntPtr.Zero) return; var h = JniPeerMembers.InstanceMethods.StartCreateInstance (__id, ((object) this).GetType (), null); SetHandle (h.Handle, JniHandleOwnership.TransferLocalRef); JniPeerMembers.InstanceMethods.FinishCreateInstance (__id, this, null); } These are better because the equivalent call to JNIEnv.FindClass() is cached, among other things. This was just one of the cases that was accidentally missed when we implemented the new Java.Interop APIs in the Xamarin timeframe. We simply needed to update our code generator to emit a better C# binding for this case. After these changes, we saw instead results in dotnet-trace (0.81%) mono.android!Android.Views.View.IOnFocusChangeListenerImplementor..ctor() This should improve the performance of all C# events that wrap Java listeners, a design-pattern commonly used in Java and Android applications. This includes the FocusedChanged event used by all .NET MAUI views on Android. See java-interop#1105 for details about this improvement. Use Function Pointers for JNI There is various machinery and generated code that makes Java interop possible from C#. Take, for example, the following instance method foo() in Java // Java object foo(object bar) { // returns some value } A C# method named CallObjectMethod is responsible for calling Java’s Native Interface (JNI) that calls into the JVM to actually invoke the Java method public static unsafe JniObjectReference CallObjectMethod (JniObjectReference instance, JniMethodInfo method, JniArgumentValue* args) { //... IntPtr thrown; var tmp = NativeMethods.java_interop_jnienv_call_object_method_a (JniEnvironment.EnvironmentPointer, out thrown, instance.Handle, method.ID, (IntPtr) args); Exception __e = JniEnvironment.GetExceptionForLastThrowable (thrown); if (__e != null) ExceptionDispatchInfo.Capture (__e).Throw (); JniEnvironment.LogCreateLocalRef (tmp); return new JniObjectReference (tmp, JniObjectReferenceType.Local); } In Xamarin.Android, .NET 6, and .NET 7 all calls into Java went through a java_interop_jnienv_call_object_method_a p/invoke, which signature looks like [DllImport (JavaInteropLib, CallingConvention = CallingConvention.Cdecl, CharSet = CharSet.Ansi)] internal static extern unsafe jobject java_interop_jnienv_call_object_method_a (IntPtr jnienv, out IntPtr thrown, jobject instance, IntPtr method, IntPtr args); Which is implemented in C as JI_API jobject java_interop_jnienv_call_object_method_a (JNIEnv *env, jthrowable *_thrown, jobject instance, jmethodID method, jvalue* args) { *_thrown = 0; jobject _r_ = (*env)->CallObjectMethodA (env, instance, method, args); *_thrown = (*env)->ExceptionOccurred (env); return _r_; } C# 9 introduced function pointers that allowed us a way to simplify things slightly — and make them faster as a result. So instead of using p/invoke in .NET 8, we could instead call a new unsafe method named CallObjectMethodA // Before var tmp = NativeMethods.java_interop_jnienv_call_object_method_a (JniEnvironment.EnvironmentPointer, out thrown, instance.Handle, method.ID, (IntPtr) args); // After var tmp = JniNativeMethods.CallObjectMethodA (JniEnvironment.EnvironmentPointer, instance.Handle, method.ID, (IntPtr) args); Which calls a C# function pointer directly [System.Runtime.CompilerServices.MethodImpl (System.Runtime.CompilerServices.MethodImplOptions.AggressiveInlining)] internal static unsafe jobject CallObjectMethodA (IntPtr env, jobject instance, IntPtr method, IntPtr args) { return (*((JNIEnv**)env))->CallObjectMethodA (env, instance, method, args); } This function pointer declared using the new syntax introduced in C# 9 public delegate* unmanaged <IntPtr, jobject, IntPtr, IntPtr, jobject> CallObjectMethodA; Comparing the two implementations with a manual benchmark # JIPinvokeTiming timing 000001.6993644 # Average Invocation 0.00016993643999999998ms # JIFunctionPointersTiming timing 000001.6561349 # Average Invocation 0.00016561349ms With a Release build, the average invocation time for JIFunctionPointersTiming takes 97% of the time as JIPinvokeTiming, i.e. is 3% faster. Additionally, using C# 9 function pointers means we can get rid of all of the java_interop_jnienv_*() C functions, which shrinks libmonodroid.so by ~55KB for each architecture. See xamarin-android#8234 and java-interop#938 for details about this improvement. Removed Xamarin.AndroidX.Legacy.Support.V4 Reviewing .NET MAUI’s Android dependencies, we noticed a suspicious package Xamarin.AndroidX.Legacy.Support.V4 If you are familiar with the Android Support Libraries, these are a set of packages Google provides to “polyfill” APIs to past versions of Android. This gives them a way to bring new APIs to old OS versions, since the Android ecosystem (OEMs, etc.) are much slower to upgrade as compared to iOS, for example. This particular package, Legacy.Support.V4, is actually support for Android as far back as Android API 4! The minimum supported Android version in .NET is Android API 21, which was released in 2017. It turns out this dependency was brought over from Xamarin.Forms and was not actually needed. As expected from this change, lots of Java code was removed from .NET MAUI apps. So much, in fact, that .NET 8 MAUI applications are now under the multi-dex limit — all Dalvik bytecode can fix into a single classes.dex file. A detailed breakdown of the size changes using apkdiff > apkdiff -f com.companyname.maui_before-Signed.apk com.companyname.maui_after-Signed.apk Size difference in bytes ([*1] apk1 only, [*2] apk2 only) + 1,598,040 classes.dex - 6 META-INF/androidx.asynclayoutinflater_asynclayoutinflater.version *1 - 6 META-INF/androidx.legacy_legacy-support-core-ui.version *1 - 6 META-INF/androidx.legacy_legacy-support-v4.version *1 - 6 META-INF/androidx.media_media.version *1 - 455 assemblies/assemblies.blob - 564 res/layout/notification_media_action.xml *1 - 744 res/layout/notification_media_cancel_action.xml *1 - 1,292 res/layout/notification_template_media.xml *1 - 1,584 META-INF/BNDLTOOL.SF - 1,584 META-INF/MANIFEST.MF - 1,696 res/layout/notification_template_big_media.xml *1 - 1,824 res/layout/notification_template_big_media_narrow.xml *1 - 2,456 resources.arsc - 2,756 res/layout/notification_template_media_custom.xml *1 - 2,872 res/layout/notification_template_lines_media.xml *1 - 3,044 res/layout/notification_template_big_media_custom.xml *1 - 3,216 res/layout/notification_template_big_media_narrow_custom.xml *1 - 2,030,636 classes2.dex Summary - 24,111 Other entries -0.35% (of 6,880,759) - 432,596 Dalvik executables -3.46% (of 12,515,440) + 0 Shared libraries 0.00% (of 12,235,904) - 169,179 Package size difference -1.12% (of 15,123,185) See dotnet/maui#12232 for details about this improvement. Deduplication of generics on iOS and macOS In .NET 7, iOS applications experienced app size increases due to C# generics usage across multiple .NET assemblies. When the .NET 7 Mono AOT compiler encounters a generic instance that is not handled by generic sharing, it will emit code for the instance. If the same instance is encountered during AOT compilation in multiple assemblies, the code will be emitted multiple times, increasing code size. In .NET 8, new dedup-skip and dedup-include command-line options are passed to the Mono AOT compiler. A new aot-instances.dll assembly is created for sharing this information in one place throughout the application. The change was tested on MySingleView app and Monotouch tests in the xamarin/xamarin-macios codebase App Baseline size on disk .ipa (MB) Target size on disk .ipa (MB) Baseline size on disk .app (MB) Target size on disk .app (MB) Baseline build time (s) Target build time (s) .app diff (%) MySingleView Release iOS 5.4 5.4 29.2 15.2 29.2 16.8 47.9 MySingleView Release iOSSimulator-arm64 N/A N/A 469.5 341.8 468.0 330.0 27.2 Monotouch Release llvm iOS 49.0 38.8 209.6 157.4 115.0 130.0 24.9 See xamarin-macios#17766 for details about this improvement. Fix System.Linq.Expressions implementation on iOS-like platforms In .NET 7, codepaths in System.Linq.Expressions were controlled by various flags such as CanCompileToIL CanEmitObjectArrayDelegate CanCreateArbitraryDelegates These flags were controlling codepaths which are “AOT friendly” and those that are not. For desktop platforms, NativeAOT specifies the following configuration for AOT-compatible code <IlcArg Include="--featureSystem.Linq.Expressions.CanCompileToIL=false" /> <IlcArg Include="--featureSystem.Linq.Expressions.CanEmitObjectArrayDelegate=false" /> <IlcArg Include="--featureSystem.Linq.Expressions.CanCreateArbitraryDelegates=false" /> When it comes to iOS-like platforms, System.Linq.Expressions library was built with constant propagation enabled and control variables were removed. This further caused above-listed NativeAOT feature switches not to have any effect (fail to trim during app build), potentially causing the AOT compilation to follow unsupported code paths on these platforms. In .NET8, we have unified the build of System.Linq.Expressions.dll shipping the same assembly for all supported platforms and runtimes, and simplified these switches to respect IsDynamicCodeSupported so that the .NET trimmer can remove the appropriate IL in System.Linq.Expressions.dll at application build time. See dotnet/runtime#87924 and dotnet/runtime#89308 for details about this improvement. Set DynamicCodeSupport=false for iOS and Catalyst In .NET 8, the feature switch $(DynamicCodeSupport) is set to false for platforms Where it is not possible to publish without the AOT compiler. When interpreter is not enabled. Which boils down to applications running on iOS, tvOS, MacCatalyst, etc. DynamicCodeSupport=false enables the .NET trimmer to remove code paths depending on RuntimeFeature.IsDynamicCodeSupported such as this example in System.Linq.Expressions. Estimated size savings are dotnet new maui (ios) old SLE.dll new SLE.dll + DynamicCodeSupported=false diff (%) Size on disk (Mb) 40,53 38,78 -4,31% .pkg (Mb) 14,83 14,20 -4,21% When combined with the System.Linq.Expressions improvements on iOS-like platforms, this showed a nice overall improvement to application size See xamarin-macios#18555 for details about this improvement. Memory Leaks Memory Leaks and Quality Given that the major theme for .NET MAUI in .NET 8 is quality, memory-related issues became a focal point for this release. Some of the problems found existed even in the Xamarin.Forms codebase, so we are happy to work towards a framework that developers can rely on for their cross-platform .NET applications. For full details on the work completed in .NET 8, we’ve various PRs and Issues related to memory issues at Pull Requests Issues You can see that considerable progress was made in .NET 8 in this area. If we compare .NET 7 MAUI versus .NET 8 MAUI in a sample application running on Windows, displaying the results of GC.GetTotalMemory() on screen Then compare the sample application running on macOS, but with many more pages pushed onto the navigation stack See the sample code for this project on GitHub for further details. Diagnosing leaks in .NET MAUI The symptom of a memory leak in a .NET MAUI application, could be something like Navigate from the landing page to a sub page. Go back. Navigate to the sub page again. Repeat. Memory grows consistently until the OS closes the application due to lack of memory. In the case of Android, you may see log messages such as 07-07 185139.090 17079 17079 D Mono GC_MAJOR (user request) time 137.21ms, stw 140.60ms los size 10984K in use 3434K 07-07 185139.090 17079 17079 D Mono GC_MAJOR_SWEEP major size 116192K in use 108493K 07-07 185139.092 17079 17079 I monodroid-gc 46204 outstanding GREFs. Performing a full GC! In this example, a 116MB heap is quite large for a mobile application, as well as over 46,000 C# <-> Java wrapper objects! To truly determine if the sub page is leaking, we can make a couple modifications to a .NET MAUI application Add logging in a finalizer. For example ~MyPage() => Console.WriteLine("Finalizer for ~MyPage()"); While navigating through your app, you can find out if entire pages are living forever if the log message is never displayed. This is a common symptom of a leak, because any View holds .Parent.Parent.Parent, etc. all the way up to the Page object. Call GC.Collect() somewhere in the app, such as the sub page’s constructor public MyPage() { GC.Collect(); // For debugging purposes only, remove later InitializeComponent(); } This makes the GC more deterministic, in that we are forcing it to run more frequently. Each time we navigate to the sub page, we are more likely causing the old sub page’s to go away. If things are working properly, we should see the log message from the finalizer. Note GC.Collect() is for debugging purposes only. You should not need this in your app after investigation is complete, so be sure to remove it afterward. With these changes in place, test a Release build of your app. On iOS, Android, macOS, etc. you can watch console output of your app to determine what is actually happening at runtime. adb logcat, for example, is a way to view these logs on Android. If running on Windows, you can also use Debug > Windows > Diagnostic Tools inside Visual Studio to take memory snapshots inside Visual Studio. In the future, we would like Visual Studio’s diagnostic tooling to support .NET MAUI applications running on other platforms. See our memory leaks wiki page for more information related to memory leaks in .NET MAUI applications. Patterns that cause leaks C# events C# events, just like a field, property, etc. can create strong references between objects. Let’s look at a situation where things can go wrong. Take for example, the cross-platform Grid.ColumnDefinitions property public class Grid Layout, IGridLayout { public static readonly BindableProperty ColumnDefinitionsProperty = BindableProperty.Create("ColumnDefinitions", typeof(ColumnDefinitionCollection), typeof(Grid), null, validateValue (bindable, value) => value != null, propertyChanged UpdateSizeChangedHandlers, defaultValueCreator bindable => { var colDef = new ColumnDefinitionCollection(); colDef.ItemSizeChanged += ((Grid)bindable).DefinitionsChanged; return colDef; }); public ColumnDefinitionCollection ColumnDefinitions { get { return (ColumnDefinitionCollection)GetValue(ColumnDefinitionsProperty); } set { SetValue(ColumnDefinitionsProperty, value); } } Grid has a strong reference to its ColumnDefinitionCollection via the BindableProperty. ColumnDefinitionCollection has a strong reference to Grid via the ItemSizeChanged event. If you put a breakpoint on the line with ItemSizeChanged +=, you can see the event has an EventHandler object where the Target is a strong reference back to the Grid. In some cases, circular references like this are completely OK. The .NET runtime(s)’ garbage collectors know how to collect cycles of objects that point each other. When there is no “root” object holding them both, they can both go away. The problem comes in with object lifetimes what happens if the ColumnDefinitionCollection lives for the life of the entire application? Consider the following Style in Application.Resources or Resources/Styles/Styles.xaml <Style TargetType="Grid" xKey="GridStyleWithColumnDefinitions"> <Setter Property="ColumnDefinitions" Value="18,*"/> </Style> If you applied this Style to a Grid on a random Page Application‘s main ResourceDictionary holds the Style. The Style holds a ColumnDefinitionCollection. The ColumnDefinitionCollection holds the Grid. Grid unfortunately holds the Page via .Parent.Parent.Parent, etc. This situation could cause entire Page‘s to live forever! Note The issue with Grid is fixed in maui#16145, but is an excellent example of illustrating how C# events can go wrong. Circular references on Apple platforms Even since the early days of Xamarin.iOS, there has existed an issue with “circular references” even in a garbage-collected runtime like .NET. C# objects co-exist with a reference-counted world on Apple platforms, and so a C# object that subclasses NSObject can run into situations where they can accidentally live forever — a memory leak. This is not a .NET-specific problem, as you can just as easily create the same situation in Objective-C or Swift. Note that this does not occur on Android or Windows platforms. Take for example, the following circular reference class MyViewSubclass UIView { public UIView? Parent { get; set; } public void Add(MyViewSubclass subview) { subview.Parent = this; AddSubview(subview); } } //... var parent = new MyViewSubclass(); var view = new MyViewSubclass(); parent.Add(view); In this case parent -> view via Subviews view -> parent via the Parent property The reference count of both objects is non-zero. Both objects live forever. This problem isn’t limited to a field or property, you can create similar situations with C# events class MyView UIView { public MyView() { var picker = new UIDatePicker(); AddSubview(picker); picker.ValueChanged += OnValueChanged; } void OnValueChanged(object? sender, EventArgs e) { } // Use this instead and it doesn't leak! //static void OnValueChanged(object? sender, EventArgs e) { } } In this case MyView -> UIDatePicker via Subviews UIDatePicker -> MyView via ValueChanged and EventHandler.Target Both objects live forever. A solution for this example, is to make OnValueChanged method static, which would result in a null Target on the EventHandler instance. Another solution, would be to put OnValueChanged in a non-NSObject subclass class MyView UIView { readonly Proxy _proxy = new(); public MyView() { var picker = new UIDatePicker(); AddSubview(picker); picker.ValueChanged += _proxy.OnValueChanged; } class Proxy { public void OnValueChanged(object? sender, EventArgs e) { } } } This is the pattern we’ve used in most .NET MAUI handlers and other UIView subclasses. See the MemoryLeaksOniOS sample repo, if you would like to play with some of these scenarios in isolation in an iOS application without .NET MAUI. Roslyn analyzer for Apple platforms We also have an experimental Roslyn Analyzer that can detect these situations at build time. To add it to net7.0-ios, net8.0-ios, etc. projects, you can simply install a NuGet package <PackageReference Include="MemoryAnalyzers" Version="0.1.0-beta.3" PrivateAssets="all" /> Some examples of a warning would be public class MyView UIView { public event EventHandler MyEvent; } Event 'MyEvent' could could memory leaks in an NSObject subclass. Remove the event or add the [UnconditionalSuppressMessage("Memory", "MA0001")] attribute with a justification as to why the event will not leak. Note that the analyzer can warns if there might be an issue, so it can be quite noisy to enable in a large, existing codebase. Inspecting memory at runtime is the best way to determine if there is truly a memory leak. Tooling and Documentation Simplified dotnet-trace and dotnet-dsrouter In .NET 7, profiling a mobile application was a bit of a challenge. You had to run dotnet-dsrouter and dotnet-trace together and get all the settings right to be able to retrieve a .nettrace or speedscope file for performance investigations. There was also no built-in support for dotnet-gcdump to connect to dotnet-dsrouter to get memory snapshots of a running .NET MAUI application. In .NET 8, we’ve streamlined this scenario by making new commands for dotnet-dsrouter that simplifies the workflow. To verify you have the latest diagnostic tooling, you can install them via $ dotnet tool install -g dotnet-dsrouter You can invoke the tool using the following command dotnet-dsrouter Tool 'dotnet-dsrouter' was successfully installed. $ dotnet tool install -g dotnet-gcdump You can invoke the tool using the following command dotnet-gcdump Tool 'dotnet-gcdump' was successfully installed. $ dotnet tool install -g dotnet-trace You can invoke the tool using the following command dotnet-trace Tool 'dotnet-trace' was successfully installed. Verify you have at least 8.x versions of these tools $ dotnet tool list -g Package Id Version Commands -------------------------------------------------------------------------------------- dotnet-dsrouter 8.0.452401 dotnet-dsrouter dotnet-gcdump 8.0.452401 dotnet-gcdump dotnet-trace 8.0.452401 dotnet-trace To profile an Android application on an Android emulator, first build and install your application in Release mode such as $ dotnet build -f net8.0-android -tInstall -c Release -pAndroidEnableProfiler=true Build SUCCEEDED. 0 Warning(s) 0 Error(s) Next, open a terminal to run dotnet-dsrouter $ dotnet-dsrouter android-emu Start an application on android emulator with one of the following environment variables set DOTNET_DiagnosticPorts=10.0.2.29000,nosuspend,connect DOTNET_DiagnosticPorts=10.0.2.29000,suspend,connect Then in a second terminal window, we can set the debug.mono.profile Android system property, as the stand-in for $DOTNET_DiagnosticPorts $ adb shell setprop debug.mono.profile '10.0.2.29000,suspend,connect' $ dotnet-trace ps 3248 dotnet-dsrouter $ dotnet-trace collect -p 3248 --format speedscope ... [00000009] Recording trace 3.2522 (MB) Press <Enter> or <Ctrl+C> to exit... Note Android doesn’t have good support for environment variables like $DOTNET_DiagnosticPorts. You can create an AndroidEnvironment text file for setting environment variables, but Android system properties can be simpler as they would not require rebuilding the application to set them. Upon launching the Android application, it should be able to connect to dotnet-dsrouter -> dotnet-trace and record performance profiling information for investigation. The --format argument is optional and it defaults to .nettrace. However, .nettrace files can be viewed only with Perfview on Windows, while the speedscope JSON files can be viewed “on” macOS or Linux by uploading them to https//speedscope.app. Note When providing a process ID to dotnet-trace, it knows how to tell if a process ID is dotnet-dsrouter and connect through it appropriately. dotnet-dsrouter has the following new commands to simplify the workflow dotnet-dsrouter android Android devices dotnet-dsrouter android-emu Android emulators dotnet-dsrouter ios iOS devices dotnet-dsrouter ios-sim iOS simulators See the .NET MAUI wiki for more information about profiling .NET MAUI applications on each platform. dotnet-gcdump Support for Mobile In .NET 7, we had a somewhat complex method (see wiki) for getting a memory snapshot of an application on the Mono runtime (such as iOS or Android). You had to use a Mono-specific event provider such as dotnet-trace collect --diagnostic-port /tmp/maui-app --providers Microsoft-DotNETRuntimeMonoProfiler0xC9000014 And then we relied on Filip Navara’s mono-gcdump tool (thanks Filip!) to convert the .nettrace file to .gcdump to be opened in Visual Studio or PerfView. In .NET 8, we now have dotnet-gcdump support for mobile scenarios. If you want to get a memory snapshot of a running application, you can use dotnet-gcdump in a similar fashion as dotnet-trace $ dotnet-gcdump ps 3248 dotnet-dsrouter $ dotnet-gcdump collect -p 3248 Writing gcdump to '20231018_115631_29880.gcdump'... Note This requires the exact same setup as dotnet-trace, such as -pAndroidEnableProfiler=true, dotnet-dsrouter, adb commands, etc. This greatly streamlines our workflow for investigating memory leaks in .NET MAUI applications. See our memory leaks wiki page for more information. The post .NET 8 Performance Improvements in .NET MAUI appeared first on .NET Blog.


The type or namespace name 'IHttpContextAccessor' ...
Category: .Net 7

Question How do you resolve an error that says "<strong ...


Views: 0 Likes: 53

Login to Continue, We will bring you back to this content 0



For peering opportunity Autonomouse System Number: AS401345 Custom Software Development at ErnesTech Email Address[email protected]