Settings Results in 4 milliseconds

Asp.Net Core Error: InvalidOperationException: The ...
Category: .Net 7

Problem&nbsp;<span style="background-color #fff ...


Views: 341 Likes: 96
ASP.NET 8 Best Practices: Coding, Performance Tips ...
Category: .Net 7

In this chapter, we will explore various best practices and performance tips to enhance your ASP. ...


Views: 368 Likes: 98
Windows 10 Linux Subsystem- Develop with Ruby on R ...
Category: Linux

Windows 10 Linu ...


Views: 367 Likes: 96
Performance Tuning for ASP.NET Web Applications
Category: .Net 7

Performance Tuning in ASP.NET Core with C# ...


Views: 398 Likes: 109
Migrate Existing Asp.Net Web Site to .Net Core
Category: .Net 7

Migrate Existing Asp.Net Web Site ...


Views: 312 Likes: 104
Profile Common Error Asp.Net
Category: .Net 7

<pre style="margin-top 18px; margin-bottom 30px; padding 5px; border-color rgb(225, 226, 226); f ...


Views: 376 Likes: 107
MSBUILD : error MSB1009: Project file does not exi ...
Category: .Net 7

Error building Dot Net Core 2.2 Docker Image from a Docker File <span style="background-c ...


Views: 8821 Likes: 166
How to Enable COM Ports on Windows 10 or 11 and Us ...
Category: Research

IntroductionSerial communication is an essential part of many computer systems, e ...


Views: 0 Likes: 32
How to get a path to a file in an ASP Dot Net (ASP ...
Category: .Net 7

Question How do you get a path to a file in ASP.Net (Dot Net) Core 2.2 Application?A ...


Views: 281 Likes: 89
Return url is required
Category: .Net 7

Question There is this annoying error in Asp.Net Core MVC that says Return Url is required. How ...


Views: 0 Likes: 58
A Data CEO’s Guide to Becoming a Data Scientist From Scratch
A Data CEO’s Guide to Becoming a Data Scientist Fr ...

If you want to know how to become a data scientist, then you’re in the right place. I’ve been where you are, and now I want to help. A decade ago, I was just a college graduate with a history degree. I then became a machine learning engineer, data science consultant, and now CEO of Dataquest. If I could do everything over, I would follow the steps I’m going to share with you in this article. It would have fast-tracked my career, saved me thousands of hours, and prevented a few gray hairs. The Wrong and Right Way  When I was learning, I tried to follow various online data science guides, but I ended up bored and without any actual data science skills to show for my time.  The guides were like a teacher at school handing me a bunch of books and telling me to read them all — a learning approach that never appealed to me. It was frustrating and self-defeating. Over time, I realized that I learn most effectively when I'm working on a problem I'm interested in.  And then it clicked. Instead of learning a checklist of data science skills, I decided to focus on building projects around real data. Not only did this learning method motivate me, it also mirrored the work I’d do in an actual data scientist role. I created this guide to help aspiring data scientists who are in the same position I was in. In fact, that’s also why I created Dataquest. Our data science courses are designed to take you from beginner to job-ready in less than 8 months using actual code and real-world projects. However, a series of courses isn’t enough. You need to know how to think, study, plan, and execute effectively if you want to become a data scientist. This actionable guide contains everything you need to know. How to Become a Data Scientist Step 1 Question Everything Step 2 Learn The Basics Step 3 Build Projects Step 4 Share Your Work Step 5 Learn From Others Step 6 Push Your Boundaries Now, let’s go over each of these one by one. Step 1 Question Everything The data science and data analytics field is appealing because you get to answer interesting questions using actual data and code. These questions can range from Can I predict whether a flight will be on time? to How much does the U.S. spend per student on education?  To answer these questions, you need to develop an analytical mindset. The best way to develop this mindset is to start with analyzing news articles. First, find a news article that discusses data. Here are two great examples Can Running Make You Smarter? or Is Sugar Really Bad for You?.  Then, think about the following How they reach their conclusions given the data they discuss How you might design a study to investigate further What questions you might want to ask if you had access to the underlying data Some articles, like this one on gun deaths in the U.S. and this one on online communities supporting Donald Trump actually have the underlying data available for download. This allows you to explore even deeper. You could do the following Download the data, and open it in Excel or an equivalent tool See what patterns you can find in the data by eyeballing it Do you think the data supports the conclusions of the article? Why or why not? What additional questions do you think you can use the data to answer? Here are some good places to find data-driven articles FiveThirtyEight New York Times Vox The Intercept Reflect After a few weeks of reading articles, reflect on whether you enjoyed coming up with questions and answering them. Becoming a data scientist is a long road, and you need to be very passionate about the field to make it all the way.  Data scientists constantly come up with questions and answer them using mathematical models and data analysis tools, so this step is great for understanding whether you'll actually like the work. If You Lack Interest, Analyze Things You Enjoy Perhaps you don't enjoy the process of coming up with questions in the abstract, but maybe you enjoy analyzing health or finance data. Find what you're passionate about, and then start viewing that passion with an analytical mindset. Personally, I was very interested in stock market data, which motivated me to build a model to predict the market. If you want to put in the months of hard work necessary to learn data science, working on something you’re passionate about will help you stay motivated when you face setbacks. Step 2 Learn The Basics Once you've figured out how to ask the right questions, you're ready to start learning the technical skills necessary to answer them. I recommend learning data science by studying the basics of programming in Python. Python is a programming language that has consistent syntax and is often recommended for beginners. It’s also versatile enough for extremely complex data science and machine learning-related work, such as deep learning or artificial intelligence using big data. Many people worry about which programming language to choose, but here are the key points to remember Data science is about answering questions and driving business value, not about tools Learning the concepts is more important than learning the syntax Building projects and sharing them is what you'll do in an actual data science role, and learning this way will give you a head start Super important note The goal isn’t to learn everything; it’s to learn just enough to start building projects.  Where You Should Learn Here are a few great places to learn Dataquest — I started Dataquest to make learning Python for data science or data analysis easier, faster, and more fun. We offer basic Python fundamentals courses, all the way to an all-in-one path consisting of all courses you need to become a data scientist.  Learn Python the Hard Way — a book that teaches Python concepts from the basics to more in-depth programs. The Python Tutorial — a free tutorial provided by the main Python site. The key is to learn the basics and start answering some of the questions you came up with over the past few weeks browsing articles. Step 3 Build Projects As you're learning the basics of coding, you should start building projects that answer interesting questions that will showcase your data science skills.  The projects you build don't have to be complex. For example, you could analyze Super Bowl winners to find patterns.  The key is to find interesting datasets, ask questions about the data, then answer those questions with code. If you need help finding datasets, check out this post for a good list of places to find them. As you're building projects, remember that Most data science work is data cleaning. The most common machine learning technique is linear regression. Everyone starts somewhere. Even if you feel like what you're doing isn't impressive, it's still worth working on. Where to Find Project Ideas Not only does building projects help you practice your skills and understand real data science work, it also helps you build a portfolio to show potential employers.  Here are some more detailed guides on building projects on your own Storytelling with data Machine learning project Additionally, most of Dataquest’s courses contain interactive projects that you can complete while you’re learning. Here are just a few examples Prison Break — Have some fun, and analyze a dataset of helicopter prison escapes using Python and Jupyter Notebook. Exploring Hacker News Posts — Work with a dataset of submissions to Hacker News, a popular technology site. Exploring eBay Car Sales Data — Use Python to work with a scraped dataset of used cars from eBay Kleinanzeigen, a classifieds section of the German eBay website. Star Wars Survey — Work with Jupyter Notebook to analyze data on the Star Wars movies. Analyzing NYC High School Data — Discover the SAT performance of different demographics using scatter plots and maps. Predicting the Weather Using Machine Learning — Learn how to prepare data for machine learning, work with time series data, measure error, and improve your model performance. Add Project Complexity After building a few small projects, it's time to kick it up a notch! We need to add layers of project complexity to learn more advanced topics. At this step, however, it's crucial to execute this in an area you're interested in. My interest was the stock market, so all my advanced projects had to do with predictive modeling. As your skills grow, you can make the problem more complex by adding nuances like minute-by-minute prices and more accurate predictions. Check out this article on Python projects for more inspiration. Step 4 Share Your Work Once you've built a few data science projects, share them with others on GitHub! Here’s why It makes you think about how to best present your projects, which is what you'd do in a data science role. They allow your peers to view your projects and provide feedback. They allow employers to view your projects. Helpful resources about project portfolios How To Present Your Data Science Portfolio on GitHub Data Science Portfolios That Will Get You the Job Start a Simple Blog Along with uploading your work to GitHub, you should also think about publishing a blog. When I was learning data science, writing blog posts helped me do the following Capture interest from recruiters Learn concepts more thoroughly (the process of teaching really helps you learn) Connect with peers Here are some good topics for blog posts Explaining data science and programming concepts Discussing your projects and walking through your findings Discussing how you’re learning data science Here’s an example of a visualization I made on my blog many years ago that shows how much each Simpsons character likes the others Step 5 Learn From Others After you've started to build an online presence, it's a good idea to start engaging with other data scientists. You can do this in-person or in online communities. Here are some good online communities /r/datascience Data Science Slack Quora Kaggle Here at Dataquest, we have an online community that learners can use to receive feedback on projects, discuss tough data-related problems, and build relationships with data professionals. Personally, I was very active on Quora and Kaggle when I was learning, which helped me immensely. Engaging in online communities is a good way to do the following Find other people to learn with Enhance your profile and find opportunities Strengthen your knowledge by learning from others You can also engage with people in-person through Meetups. In-person engagement can help you meet and learn from more experienced data scientists in your area. Step 6 Push Your Boundaries What kind of data scientists to companies want to hire? The ones that find critical insights that save them money or make their customers happier. You have to apply the same process to learning — keep searching for new questions to answer, and keep answering harder and more complex questions.  If you look back on your projects from a month or two ago, and you don’t see room for improvement, you probably aren't pushing your boundaries enough. You should be making strong progress every month, and your work should reflect that. Here are some ways to push your boundaries and learn data science faster Try working with a larger dataset  Start a data science project that requires knowledge you don't have Try making your project run faster Teach what you did in a project to someone else You’ve Got This! Studying to become a data scientist or data engineer isn't easy, but the key is to stay motivated and enjoy what you're doing. If you're consistently building projects and sharing them, you'll build your expertise and get the data scientist job that you want. I haven't given you an exact roadmap to learning data science, but if you follow this process, you'll get farther than you imagined you could. Anyone can become a data scientist if you're motivated enough. After years of being frustrated with how conventional sites taught data science, I created Dataquest, a better way to learn data science online. Dataquest solves the problems of MOOCs, where you never know what course to take next, and you're never motivated by what you're learning. Dataquest leverages the lessons I've learned from helping thousands of people learn data science, and it focuses on making the learning experience engaging. At Dataquest, you'll build dozens of projects, and you’ll learn all the skills you need to be a successful data scientist. Dataquest students have been hired at companies like Accenture and SpaceX . Good luck becoming a data scientist! Becoming a Data Scientist — FAQs What are the data scientist qualifications? Data scientists need to have a strong command of the relevant technical skills, which will include programming in Python or R, writing queries in SQL, building and optimizing machine learning models, and often some "workflow" skills like Git and the command line. Data scientists also need strong problem-solving, data visualization, and communication skills. Whereas a data analyst will often be given a question to answer, a data scientist is expected to explore the data and find relevant questions and business opportunities that others may have missed. While it is possible to find work as a data scientist with no prior experience, it's not a common path. Normally, people will work as a data analyst or data engineer before transitioning into a data scientist role. What are the education requirements for a data scientist? Most data scientist roles will require at least a Bachelor's degree. Degrees in technical fields like computer science and statistics may be preferred, as well as advanced degrees like Ph.D.s and Master’s degrees. However, advanced degrees are generally not strictly required (even when it says they are in the job posting). What employers are concerned about most is your skill-set. Applicants with less advanced or less technically relevant degrees can offset this disadvantage with a great project portfolio that demonstrates their advanced skills and experience doing relevant data science work. What skills are needed to become a data scientist? Specific requirements can vary quite a bit from job to job, and as the industry matures, more specialized roles will emerge. In general, though, the following skills are necessary for virtually any data science role Programming in Python or R SQL Probability and statistics Building and optimizing machine learning models Data visualization Communication Big data Data mining Data analysis Every data scientist will need to know the basics, but one role might require some more in-depth experience with Natural Language Processing (NLP), whereas another might need you to build production-ready predictive algorithms. Is it hard to become a data scientist? Yes — you should expect to face challenges on your journey to becoming a data scientist. This role requires fairly advanced programming skills and statistical knowledge, in addition to strong communication skills. Anyone can learn these skills, but you'll need motivation to push yourself through the tough moments. Choosing the right platform and approach to learning can also help make the process easier. How long does it take to become a data scientist? The length of time it takes to become a data scientist varies from person to person. At Dataquest, most of our students report reaching their learning goals in one year or less. How long the learning process takes you will depend on how much time you're able to dedicate to it. Similarly, the job search process can vary in length depending on the projects you've built, your other qualifications, your professional background, and more. Is data science a good career choice? Yes — a data science career is a fantastic choice. Demand for data scientists is high, and the world is generating a massive (and increasing) amount of data every day.  We don't claim to have a crystal ball or know what the future holds, but data science is a fast-growing field with high demand and lucrative salaries. What is the data scientist career path? The typical data scientist career path usually begins with other data careers, such as data analysts or data engineers. Then it moves into other data science roles via internal promotion or job changes. From there, more experienced data scientists can look for senior data scientist roles. Experienced data scientists with management skills can move into director of data science and similar director and executive-level roles. What salaries do data scientists make? Salaries vary widely based on location and the experience level of the applicant. On average, however, data scientists make very comfortable salaries. In 2022, the average data scientist salary is more than $120,000 USD per year in the US. And other data science roles also command high salaries Data analyst $96,707 Data engineer $131,444 Data architect $135,096 Business analyst $97,224 Which certification is best for data science? Many assume that a data science certification or completion of a data science bootcamp is something that hiring managers are looking for in qualified candidates, but this isn’t true. Hiring managers are looking for a demonstration of the skills required for the job. And unfortunately, a data analytics or data science certificate isn’t the best showcase of your skills.  The reason for this is simple.  There are dozens of bootcamps and data science certification programs out there. Many places offer them — from startups to universities to learning platforms. Because there are so many, employers have no way of knowing which ones are the most rigorous.  While an employer may view a certificate as an example of an eagerness to continue learning, they won’t see it as a demonstration of skills or abilities. The best way to showcase your skills properly is with projects and a robust portfolio.


ASP.Net Scheduled Jobs (Fire and Forget)
Category: .Net 7

Fire and Forget Asp.Net Library fo ...


Views: 363 Likes: 84
XmlException: The ':' character, hexadecimal value ...
Category: XML

Question How do you resolve the error that comes when generating an XML SiteMap ...


Views: 87 Likes: 68
How to get a base url for Asp.Net Core Application ...
Category: .Net 7

Problem If you are working in Asp.Net Core application and would like to get the application Bas ...


Views: 1708 Likes: 79
Asp.Net Core 3.1 Not Returning NoContent() or BadR ...
Category: .Net 7

&nbsp; &nbsp; &nbsp; When developing in Asp.Net Core 3.1 you might want to code your algorithms t ...


Views: 663 Likes: 107
How to Write to PDF using an Open Source Library c ...
Category: .Net 7

Question How do I use the #PDF Library called iText 7 to write to the pdf in C-sharp?<br / ...


Views: 400 Likes: 93
error : MSB4803: The task "ResolveComReference" is ...
Category: .Net 7

How do I resolve this error? "C\Program Files\d ...


Views: 0 Likes: 61
Computer Vision in PyTorch (Part 2): Preparing Data, Training, and Evaluating Your CNN for Pneumonia Detection
Computer Vision in PyTorch (Part 2) Preparing Dat ...

In Part 1 of this tutorial series, we explored the fundamentals of Convolutional Neural Networks (CNNs) and built a complete CNN architecture using PyTorch for pneumonia detection in chest X-rays. We learned why CNNs excel at image tasks, examined each component in detail, and implemented a custom PneumoniaCNN class by taking an OOP approach and subclassing PyTorch's nn.Module class. Now it's time to bring our model to life! In this tutorial, we'll complete our pneumonia detection system by Preparing and preprocessing the chest X-ray dataset Training our CNN model with a complete training loop Evaluating model performance using metrics like precision, recall, and F1 Interpreting evaluation results with a focus on visualizing predictions Addressing common CNN training issues like overfitting, underfitting, and class imbalance By the end of this tutorial, you'll have transformed your CNN architecture into a working medical diagnostic tool and gained practical skills for implementing and evaluating deep learning models. Let's get started! Prerequisites Before proceeding, make sure you've Read Part 1 of this tutorial series Installed PyTorch (follow PyTorch's official installation instructions) Reviewed fundamental deep learning concepts such as Layers Activation functions Loss Optimization Below are the modules, functions, and classes we’ll need for this tutorial. Be sure to install any libraries you’re missing after running this code import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim from torch.utils.data import DataLoader, Dataset import torchvision import torchvision.transforms as transforms from torchvision.datasets import ImageFolder import matplotlib.pyplot as plt import numpy as np import tarfile import os import collections import random from sklearn.metrics import confusion_matrix, classification_report from sklearn.model_selection import train_test_split from PIL import Image import seaborn as sns 1. Preparing and Preprocessing the X-ray Image Dataset We’ll start by preparing our chest X-ray dataset. The dataset contains X-ray images of lungs classified into two categories NORMAL and PNEUMONIA. Downloading and Extracting the Dataset The data for this tutorial is available for download here as a compressed tar.gz file. After downloading, you'll need to extract it to access the images # Path to the downloaded tar.gz file dataset_path = "xray_dataset.tar.gz" # If saved to your current directory # Extract the dataset with tarfile.open(dataset_path, "rgz") as tar tar.extractall() print("Dataset extracted successfully") After extraction, you should have this directory structure chest_xray/ +-- test/ ¦ +-- NORMAL/ ¦ +-- PNEUMONIA/ +-- train/ +-- NORMAL/ +-- PNEUMONIA/ Verifying Dataset Structure and File Counts After extracting the dataset, it's good practice to verify the contents and get a count of the image files. This ensures we're working with the correct data and helps identify potential issues early on. We'll create a small helper function to scan the train and test directories. This function will gather the file paths for all valid JPEG images and their corresponding class labels (0 for NORMAL, 1 for PNEUMONIA). Collecting these paths and labels now will also prepare us for the next step in preparing our data for training. # Define base directories relative to your notebook/script location data_dir = "chest_xray" train_dir = os.path.join(data_dir, "train") test_dir = os.path.join(data_dir, "test") # Define the classes based on the subfolder names class_names = ['NORMAL', 'PNEUMONIA'] class_to_idx = {cls_name i for i, cls_name in enumerate(class_names)} # Helper function to scan directories, filter JPEG images, and collect paths/labels def get_image_paths_and_labels(data_dir) image_paths = [] labels = [] print(f"Scanning directory {data_dir}") for label_name in class_names class_dir = os.path.join(data_dir, label_name) count = 0 # List files in the class directory for filename in os.listdir(class_dir) # Keep only files ending with .jpeg (case-insensitive) if filename.lower().endswith('.jpeg') image_paths.append(os.path.join(class_dir, filename)) labels.append(class_to_idx[label_name]) count += 1 print(f" Found {count} '.jpeg' images for class '{label_name}'") return image_paths, labels # Get paths and labels for the training set all_train_paths, all_train_labels = get_image_paths_and_labels(train_dir) train_counts = collections.Counter(all_train_labels) total_train_images = len(all_train_paths) print(f"Training Set Counts") print(f" NORMAL (Class 0) {train_counts[class_to_idx['NORMAL']]}") print(f" PNEUMONIA (Class 1) {train_counts[class_to_idx['PNEUMONIA']]}") print(f" Total Training Samples {total_train_images}") # Get paths and labels for the test set all_test_paths, all_test_labels = get_image_paths_and_labels(test_dir) test_counts = collections.Counter(all_test_labels) total_test_images = len(all_test_paths) print(f"Test Set Counts") print(f" NORMAL (Class 0) {test_counts[class_to_idx['NORMAL']]}") print(f" PNEUMONIA (Class 1) {test_counts[class_to_idx['PNEUMONIA']]}") print(f" Total Test Samples {total_test_images}") Running this code will scan the directories and produce the following counts Scanning directory chest_xray/train Found 1349 '.jpeg' images for class 'NORMAL' Found 3883 '.jpeg' images for class 'PNEUMONIA' Training Set Counts NORMAL (Class 0) 1349 PNEUMONIA (Class 1) 3883 Total Training Samples 5232 Scanning directory chest_xray/test Found 234 '.jpeg' images for class 'NORMAL' Found 390 '.jpeg' images for class 'PNEUMONIA' Test Set Counts NORMAL (Class 0) 234 PNEUMONIA (Class 1) 390 Total Test Samples 624 Excellent! Our helper function scanned the directories and gave us clean lists of all the usable JPEG images and their corresponding labels for both training and testing (stored in variables like all_train_paths, all_train_labels, etc.). Now, looking at the training counts (1349 NORMAL to 3883 PNEUMONIA), something immediately stands out there are almost three times as many pneumonia examples in our training data! This situation, where one class significantly outnumbers another, is called class imbalance. While techniques exist to directly address class imbalance during training (we’ll talk about those later), our plan for now is to first train the model using the data as-is. That said, this imbalance means we'll need to be especially careful when we get to evaluating the model's performance. We can't just rely on overall accuracy; we'll need to use specific metrics that tell us how well the model identifies both classes fairly. But we’ll get to that too. Having prepared the lists of training image paths and labels, we're now ready for the next important step in preparing our data splitting off a portion of the training images to create a validation set. Why You Need a Validation Set Now that we have lists of our training images and associated labels, you might be thinking, “Why did we need those specific lists?” Well, before we train our model, we need to set aside a portion of that training data to create a validation set. It might seem strange to not use all available data for training, but this validation split is vital for trustworthy model development. Here's why Tuning & Monitoring While training, we need to monitor how well the model is learning and potentially tune things like the learning rate or decide when to stop training. We need a dataset for this that the model isn't directly training on, but which isn't our final, untouched test set. That's the validation set's job. Avoiding Data Leakage If we used the test set to make these tuning decisions, we'd essentially be "leaking" information about the test set into our model development process. The model might end up looking good on that specific test set simply because we optimized for it, but fail to generalize to new, truly unseen data. Unbiased Final Test The test set should only be used once, at the very end, after all training and tuning are complete, to get an unbiased estimate of the final model's performance. So, we reserve the test_paths/test_labels for the final evaluation and split our all_train_paths/all_train_labels into two new subsets one for actual training, and one for validation during development. We'll use the train_test_split function from scikit-learn for this. Because we identified a class imbalance earlier, we'll use the stratify option to ensure both the new training subset and the validation set maintain the original proportion of NORMAL and PNEUMONIA images. Using random_state ensures the split is the same every time the code runs. # Define proportion for validation set val_split_ratio = 0.2 SEED = 42 # Perform stratified split train_paths, val_paths, train_labels, val_labels = train_test_split( all_train_paths, all_train_labels, test_size=val_split_ratio, stratify=all_train_labels, random_state=SEED ) # Print the number of samples in each resulting set print(f"Original training image count {len(all_train_paths)}") print(f"--> Split into {len(train_paths)} training samples") print(f"--> Split into {len(val_paths)} validation samples") Running this will perform the split and show the resulting counts Original training image count 5232 --> Split into 4185 training samples --> Split into 1047 validation samples We now have distinct lists of file paths and corresponding labels for our training data (train_paths, train_labels) and our validation data (val_paths, val_labels). These lists tell us which images belong in each set. But simply having file paths isn't enough to feed data into a PyTorch model. Each image needs to be loaded and undergo several processing steps first. These include standard operations like resizing all images to a consistent 256×256 dimension and converting them into the correct format (single-channel grayscale tensors). Additionally, to help our model learn more robust features from a smaller dataset and generalize better from our specific training images, we'll apply a technique called data augmentation, but only on the training set. Understanding Data Augmentation We have our training images identified, but deep learning models often benefit from seeing a large variety and quantity of data. What if our training set, particularly after splitting, isn't large or diverse enough to teach the model to generalize well to all possible variations it might encounter in new X-rays? This is where data augmentation comes in. What is Data Augmentation? Data augmentation is a technique used to artificially increase the diversity of your training dataset without actually collecting new images. It involves applying random, yet realistic, transformations to the images during the training process. Each time the model sees an image from the training set, it might see a slightly altered version (e.g., flipped horizontally or slightly rotated). Why Use Data Augmentation? Improved Generalization & Robustness By exposing the model to these variations (like different orientations or flips), it learns to focus on the underlying patterns relevant to the task (e.g., signs of pneumonia) rather than potentially irrelevant characteristics like the exact positioning of the patient. This helps the model generalize better to new, unseen images that might have similar slight variations. Reduced Overfitting It effectively increases the perceived size of the training set, making it harder for the model to simply memorize the training examples. This is particularly valuable when working with specialized datasets (like medical images) that might be smaller than general-purpose image datasets. Our Chosen Augmentations For this tutorial, we'll apply two simple and common augmentation techniques using torchvision.transforms transforms.RandomHorizontalFlip(0.5) This randomly flips the image horizontally (left-to-right) with a default probability of 50%. transforms.RandomRotation(10) This randomly rotates the image by a small angle, in this case, up to 10 degrees in either direction. These simple variations help the model learn features that aren't dependent on perfect orientation or specific left-right positioning. Many other augmentation techniques exist, like adjusting brightness/contrast, zooming, or shearing, but we'll stick to these two for now. Important Training Only! Crucially, data augmentation is applied only to the training set. We do not apply random augmentations to the validation or test sets. Why? Because we need a consistent and unbiased measure of the model's performance on unmodified data during validation (for tuning) and testing (for final evaluation). Augmenting validation/test data would introduce randomness that makes performance measurement unreliable. Now that we understand the concept and benefits of data augmentation, let's define the complete image transformation pipelines for our training, validation, and test sets. Defining Image Transforms Now that we have lists specifying which images belong to our training and validation sets, we need to define how to process each image file into a standardized tensor format suitable for our PyTorch model. This involves creating processing pipelines using torchvision.transforms. We'll need slightly different pipelines for training data (which includes random augmentation) and for validation/test data (which does not). These pipelines need to perform several key operations consistently for every image Resize to Fixed Dimensions To meet our model's required 256×256 input size, the first step is transforms.Resize((256, 256)). Be aware that because the original X-rays vary in size, this forces a square aspect ratio and will distort non-square images by stretching or squashing them. While this could potentially obscure subtle diagnostic cues related to shape or proportion, CNNs can often adapt and learn effectively from such consistently distorted data. We'll use this standard resizing approach for our fixed-input model, but keep in mind that if evaluation reveals performance issues potentially linked to shape distortion, exploring aspect-preserving alternatives (like padding before resizing) would be a logical next step to investigate. Ensure Grayscale The model architecture also expects single-channel grayscale images (in_channels=1). To guarantee this format for all images processed, we include transforms.Grayscale(num_output_channels=1). Data Augmentation (Training Only) For the training pipeline (train_transforms), we'll insert the transforms.RandomHorizontalFlip() and transforms.RandomRotation(10) steps discussed in the previous section to help the model generalize better. These are not included in the validation/test pipeline. Convert to Tensor & Scale The final step is transforms.ToTensor(). This performs two critical functions it converts the processed PIL Image object into a PyTorch tensor, and it scales the pixel values from the original integer range [0, 255] down to a floating-point range of [0.0, 1.0]. This [0, 1] scaling acts as our input normalization for this tutorial. We are opting for this simpler approach instead of standardizing with a separate transforms.Normalize(mean, std) step, relying partly on the BatchNorm2d layers within our model to help adapt to the input distribution during training. Creating the Pipelines With these steps decided, we define two distinct data process pipelines using transforms.Compose # Transformations for the training set (including augmentation) train_transforms = transforms.Compose([ transforms.Resize((256, 256)), transforms.Grayscale(num_output_channels=1), transforms.RandomHorizontalFlip(), transforms.RandomRotation(10), transforms.ToTensor() # Converts to tensor AND scales to [0, 1] ]) # Transformations for the validation and test sets (NO augmentation) val_test_transforms = transforms.Compose([ transforms.Resize((256, 256)), transforms.Grayscale(num_output_channels=1), transforms.ToTensor() # Converts to tensor AND scales to [0, 1] ]) print("Transformation pipelines defined.") Now that we've defined how to process the images with train_transforms and val_test_transforms, we need an efficient way to connect these pipelines to our lists of image paths (train_paths, val_paths, etc.). Specifically, we need a structure that can take an index, find the corresponding image path and label, load the image file, apply the correct transformations, and provide the resulting tensor and label to PyTorch for training or evaluation. This requires creating a custom PyTorch Dataset. Let's build that next. Creating a Custom PyTorch Dataset We have our lists of image paths, and we have our processing pipelines, so now let’s bring them together so PyTorch can load and transform images during training and evaluation. While PyTorch offers built-in datasets like ImageFolder, they assume a specific directory structure and aren't ideal for using pre-split lists of file paths with different transforms assigned to each split. Thankfully, PyTorch makes it straightforward to create our own custom dataset handling logic by inheriting from the base torch.utils.data.Dataset class. A custom Dataset needs to implement three essential methods __init__(self, ...) Initializes the dataset, typically by storing file paths, labels, and any necessary transformations. __len__(self) Returns the total number of samples in the dataset. __getitem__(self, idx) Loads and returns a single sample (usually an image tensor and its label) from the dataset, given an index idx. This method is where the image loading and transformations are actually applied, often "just-in-time" when the sample is requested. Let's define our XRayDataset class class XRayDataset(Dataset) """Custom Dataset for loading X-ray images from file paths.""" def __init__(self, image_paths, labels, transform=None) """ Args image_paths (list) List of paths to images. labels (list) List of corresponding labels (0 or 1). transform (callable, optional) Optional transform to be applied on a sample. """ self.image_paths = image_paths self.labels = labels self.transform = transform def __len__(self) """Returns the total number of samples in the dataset.""" return len(self.image_paths) def __getitem__(self, idx) """ Fetches the sample at the given index, loads the image, applies transformations, and handles potential errors. Args idx (int) The index of the sample to fetch. Returns tuple (image_tensor, label) if successful. None If an error occurs (e.g., file not found, processing error), signalling to skip this sample. """ # Get the path and label for the requested index img_path = self.image_paths[idx] label = self.labels[idx] try # Load the image using PIL within a context manager with Image.open(img_path) as img # Apply transforms ONLY if they exist if self.transform # Apply the entire transform pipeline image_tensor = self.transform(img) # Return the processed tensor and label return image_tensor, label else # This branch indicates a setup error, as the transform # pipeline should at least contain ToTensor(). raise ValueError(f"Dataset initialized without transforms for {img_path}. " "Transforms (including ToTensor) are required.") except FileNotFoundError # Handle cases where the image file doesn't exist print(f"Warning Image file not found at {img_path}. Skipping sample {idx}.") return None # Returning None signals to skip except ValueError as e # Catch the specific error we raised for missing transforms print(f"Error for sample {idx} at {img_path} {e}") raise e # Re-raise critical setup errors except Exception as e # Catch any other PIL loading or transform errors print(f"Warning Error processing image {img_path} (sample {idx}) {e}. Skipping sample.") return None # Returning None signals to skip Explanation __init__ The constructor (__init__) is straightforward. It simply stores the essential information passed when we create an instance of XRayDataset the list of image paths, the corresponding list of labels, and the specific torchvision.transforms pipeline that should be applied to images from this dataset. __len__ This method allows PyTorch code to easily get the total size of the dataset by simply returning the number of image paths provided during initialization. __getitem__ This is the core method where the actual data loading and processing happens for a single sample. When requested by its index (idx), it performs the following steps Retrieves the image file path and label using the index. Opens the image file using the PIL library. Applies the entire transformation pipeline (like train_transforms or val_test_transforms) stored in self.transform. Returns the processed image tensor and its integer label if successful. Crucially, this loading and transforming happens "on demand" or "lazily." The implementation also includes basic error handling if an image file is missing or fails during processing, it prints a warning and returns None, signaling that this sample should be skipped. This XRayDataset class gives us a blueprint for handling our image data. With this class defined, we can now create the specific Dataset instances we need one for our training data using train_paths and train_transforms, one for validation using val_paths and val_test_transforms, and one for our test set. Let's instantiate these datasets next. Creating Final Datasets and DataLoader Objects With our XRayDataset class ready, we can now instantiate it for each of our data splits. We'll pair the appropriate lists of image paths and labels with the corresponding transformation pipelines we defined earlier. # Instantiate the custom Dataset for each split train_dataset = XRayDataset( image_paths=train_paths, labels=train_labels, transform=train_transforms # Apply training transforms (incl. augmentation) ) val_dataset = XRayDataset( image_paths=val_paths, labels=val_labels, transform=val_test_transforms # Apply validation transforms (no augmentation) ) test_dataset = XRayDataset( image_paths=all_test_paths, # Using all_test_paths from verification step labels=all_test_labels, # Using all_test_labels from verification step transform=val_test_transforms # Apply validation/test transforms ) # Print dataset sizes to confirm print("Final Dataset objects created") print(f" Training dataset size {len(train_dataset)}") print(f" Validation dataset size {len(val_dataset)}") print(f" Test dataset size {len(test_dataset)}") This gives us three Dataset objects, each knowing how to access and transform its specific set of images. Final Dataset objects created Training dataset size 4185 Validation dataset size 1047 Test dataset size 624 Introducing DataLoader While Dataset objects allow us to access individual processed samples via dataset[index], we typically train neural networks on mini-batches of data, not one sample at a time. Processing batches is more computationally efficient and helps stabilize the learning process. PyTorch's torch.utils.data.DataLoader class is designed precisely for this. It takes a Dataset object and provides an iterable that yields batches of data. Key features include Batching Automatically groups individual samples from the Dataset into batches of a specified size (batch_size). Shuffling Can automatically shuffle the training data at the beginning of each epoch (shuffle=True) to ensure the model doesn't learn based on the order of examples. Shuffling is typically disabled for validation and testing for consistent evaluation. Parallel Loading Can use multiple background worker processes (num_workers) to load data concurrently, preventing data loading from becoming a bottleneck during training, especially when using a GPU. The num_workers argument specifies how many subprocesses to use for data loading. While values > 0 can speed things up by loading data in parallel, they can sometimes cause issues in certain environments (like Colab notebooks). If you encounter errors during training related to workers, try setting num_workers=0, which loads data in the main process. Memory Pinning Can use pin_memory=True to speed up data transfer from CPU to GPU memory when training on CUDA-enabled devices. Creating the DataLoader Instances Let's create DataLoader instances for each of our datasets # Define batch size (can be tuned depending on GPU memory) batch_size = 32 # Create DataLoader for the training set train_loader = DataLoader( dataset=train_dataset, batch_size=batch_size, shuffle=True, # Shuffle data each epoch for training num_workers=2, # Number of subprocesses to use for data loading (adjust based on system) pin_memory=True # Speeds up CPU-GPU transfer if using CUDA ) # Create DataLoader for the validation set val_loader = DataLoader( dataset=val_dataset, batch_size=batch_size, shuffle=False, # No need to shuffle validation data num_workers=2, pin_memory=True ) # Create DataLoader for the test set test_loader = DataLoader( dataset=test_dataset, batch_size=batch_size, shuffle=False, # No need to shuffle test data num_workers=2, pin_memory=True ) print(f"DataLoaders created with batch size {batch_size}.") With train_loader, val_loader, and test_loader created, our data preparation pipeline is complete! These loaders are now ready to efficiently supply batches of preprocessed image tensors and labels to our model during the training, validation, and testing phases. A good next step is often to visualize a few images from the train_loader to visually inspect the results of the transformations and augmentations before proceeding to model training. Visualizing Sample Images Before we start training, it's crucial to visually inspect the output of our DataLoader objects. This acts as a sanity check to ensure our data loading, preprocessing, and augmentation steps are working correctly – essentially, we get to "see what the model will see." Let's create a helper function to display a batch of images def show_batch(dataloader, class_names, title="Sample Batch", n_samples=8) """Displays a batch of transformed images from a DataLoader.""" try images, labels = next(iter(dataloader)) # Get one batch except StopIteration print("DataLoader is empty or exhausted.") return # Limit number of samples to display if batch is smaller than n_samples actual_samples = min(n_samples, images.size(0)) if actual_samples <= 0 print("No samples found in the batch to display.") return images = images[actual_samples] labels = labels[actual_samples] # Tensors are likely on GPU if device='cuda', move to CPU for numpy/plotting images = images.cpu() labels = labels.cpu() # Determine subplot layout if actual_samples <= 4 ncols = actual_samples; nrows = 1; figsize = (3 * ncols, 4) else ncols = 4; nrows = 2; figsize = (12, 6) fig, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=figsize) if nrows == 1 and ncols == 1 axes = np.array([axes]) # Handle single plot case axes = axes.flatten() # Flatten axes array for easy iteration fig.suptitle(title, fontsize=16) for i in range(actual_samples) ax = axes[i] img_tensor = images[i] # Shape is [C=1, H, W], scaled [0.0, 1.0] # Reminder ToTensor scaled pixels to [0, 1] # Matplotlib can directly display tensors in this range with cmap='gray' # Permute dimensions from [C, H, W] to [H, W, C] for matplotlib img_display = img_tensor.permute(1, 2, 0).numpy() # Display the image, removing the channel dimension using squeeze() for grayscale # Specify vmin/vmax ensures correct display range for float data ax.imshow(img_display.squeeze(), cmap='gray', vmin=0.0, vmax=1.0) ax.set_title(f"Class {class_names[labels[i]]}") # Use passed class_names ax.axis('off') # Hide any unused subplots if the grid is larger than needed for j in range(actual_samples, len(axes)) axes[j].axis('off') plt.tight_layout() plt.subplots_adjust(top=0.88 if title else 0.95, hspace=0.3) # Adjust for suptitle plt.show() # Visualize training samples (should show augmentations) print("Visualizing a batch from train_loader...") show_batch(train_loader, class_names, title="Sample Processed Training Images") # Visualize validation samples (should NOT show augmentations) print("Visualizing a batch from val_loader...") show_batch(val_loader, class_names, title="Sample Processed Validation Images") Visualizing a batch from train_loader... Visualizing a batch from val_loader... Interpreting the Visualizations The images displayed above are samples drawn directly from our train_loader and val_loader. They reflect the full preprocessing pipeline Resized to 256×256 pixels. Converted to single-channel grayscale. If from train_loader Randomly flipped horizontally and/or slightly rotated due to data augmentation. Converted to PyTorch tensors with pixel values scaled to the [0.0, 1.0] range via transforms.ToTensor. What to Look For Format You should see grayscale chest X-rays, all uniformly sized. Labels Each image should have the correct class title ('NORMAL' or 'PNEUMONIA'). Augmentation Images from train_loader might show random variations (flips, rotations) each time you run this visualization. Images from val_loader should appear consistent without these random effects. Intensity Range The images are displayed directly from the [0, 1] scaled tensors. Ensure they look reasonable (not all black or all white, details visible). Orientation Marker & Augmentation You'll likely notice a letter marker, commonly an 'R', often placed in an upper corner of the X-rays. This marker indicates the patient's right side. Since standard chest X-rays are taken with the patient facing the detector, their right side appears on the left side of the image. Now, look closely at the samples from the train_loader if RandomHorizontalFlip was applied to an image, you’ll see this 'R' marker appearing reversed and on the right side of the image! This is a perfect visual confirmation that your training data augmentation is active. Images from the val_loader should consistently show the marker in its standard position (patient's right on the image's left). This visualization step confirms that our data loaders are correctly yielding processed image tensors in the format and range our model expects, with augmentations applied appropriately. With this confirmation, our data is ready for the main event training the CNN model. 2. Training Our CNN Model With our data prepared and loaded efficiently using DataLoader objects, we're ready to move on to the main event training the model to distinguish between NORMAL and PNEUMONIA chest X-rays. For this, we'll use the PneumoniaCNN architecture we carefully designed together previously. Instantiating the Model and Setting the Device The first step is to use the PneumoniaCNN class definition we built in Part 1 of this tutorial series. You'll need to make sure that Python class definition is available in your current environment, typically by copying the class PneumoniaCNN(nn.Module) ... block into a code cell and running it here if you haven't already. Once the PneumoniaCNN class is defined, we can create an instance of it. Don’t forget that we must then immediately move this model instance to the appropriate computing device (cpu or cuda) that we set up earlier. Performing operations between the model and data requires them both to reside on the same device. # Instantiate the model model = PneumoniaCNN() # Check if CUDA (GPU support) is available, otherwise use CPU device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # Move the model to the chosen device (GPU or CPU) model.to(device) print(f"Model '{type(model).__name__}' instantiated and moved to '{device}'.") Now our model object is created and resides on the correct device. Before we can start the training loop itself, we need two more key components A Loss Function To measure how inaccurate the model's predictions are compared to the true labels. An Optimizer To define the algorithm used to update the model's weights based on the calculated loss. Let's define these next. Defining the Loss Function and Optimizer With our model instantiated and placed on the correct device, we need two final components before building the training loop Loss Function This measures how far the model's predictions (logits) are from the actual target labels. The computed loss value is what the model tries to minimize during training. Optimizer This implements an algorithm (like Stochastic Gradient Descent or variations thereof) that updates the model's weights based on the gradients computed during the backward pass, aiming to reduce the loss. Let's define these for our task # Define loss function criterion = nn.CrossEntropyLoss() # Define optimizer optimizer = optim.Adam(model.parameters(), lr=0.0001) print("Loss function and optimizer defined.") Explanation Loss Function (criterion) We instantiate nn.CrossEntropyLoss. This is the standard choice for multi-class classification problems like ours (Normal vs. Pneumonia). It's particularly convenient because it expects the raw, unnormalized scores (logits) directly from the model's final layer and internally applies the necessary calculations (like LogSoftmax and Negative Log-Likelihood loss) to determine the error. Optimizer (optimizer) We select optim.Adam, a very popular and often effective optimization algorithm. It's known for its adaptive learning rate capabilities, meaning it can adjust the learning rate for each parameter during training, which frequently leads to faster convergence compared to simpler optimizers like basic SGD. model.parameters() We pass this to the optimizer to tell it exactly which tensors within our model are the learnable weights and biases that it should be updating. lr=0.0001 This argument sets the initial learning rate. It's a crucial hyperparameter controlling how large the updates to the weights are on each step. A value between 0.001 and 0.0001 is often a good starting point for the Adam optimizer, but it might need tuning later. Alright, all the preparatory pieces are in place! We have our instantiated model ready on the correct device, our DataLoaders (train_loader, val_loader, test_loader) prepared to serve batches of processed data, our criterion defined to measure loss, and our optimizer configured to update the model's parameters. We're finally ready to orchestrate the actual learning process by implementing the training loop. Implementing the Training Loop Now let's implement a complete training loop def train_model(model, train_loader, val_loader, criterion, optimizer, device, num_epochs=20) """Trains and validates the model.""" # Initialize lists to track metrics train_losses = [] val_losses = [] train_accuracies = [] val_accuracies = [] print("Starting Training...") # Training loop for epoch in range(num_epochs) # Training Phase model.train() # Set model to training mode (enables dropout, batch norm updates) running_loss = 0.0 correct_train = 0 total_train = 0 # Iterate over training data for i, (images, labels) in enumerate(train_loader) # Move data to the specified device images, labels = images.to(device), labels.to(device) # Zero the parameter gradients optimizer.zero_grad() # Forward pass outputs = model(images) loss = criterion(outputs, labels) # Backward pass and optimize loss.backward() optimizer.step() # Track training loss and accuracy running_loss += loss.item() * images.size(0) # loss.item() is the avg loss per batch _, predicted = torch.max(outputs.data, 1) total_train += labels.size(0) correct_train += (predicted == labels).sum().item() # Calculate training statistics for the epoch epoch_train_loss = running_loss / len(train_loader.dataset) epoch_train_acc = correct_train / total_train train_losses.append(epoch_train_loss) train_accuracies.append(epoch_train_acc) # Validation Phase model.eval() # Set model to evaluation mode (disables dropout, uses running stats for batch norm) val_loss = 0.0 correct_val = 0 total_val = 0 # Disable gradient calculations for validation with torch.no_grad() for images, labels in val_loader images, labels = images.to(device), labels.to(device) outputs = model(images) loss = criterion(outputs, labels) val_loss += loss.item() * images.size(0) _, predicted = torch.max(outputs.data, 1) total_val += labels.size(0) correct_val += (predicted == labels).sum().item() # Calculate validation statistics for the epoch epoch_val_loss = val_loss / len(val_loader.dataset) epoch_val_acc = correct_val / total_val val_losses.append(epoch_val_loss) val_accuracies.append(epoch_val_acc) # Print statistics for the epoch print(f"Epoch {epoch+1}/{num_epochs}") print(f" Train Loss {epoch_train_loss.4f}, Train Acc {epoch_train_acc.4f}") print(f" Val Loss {epoch_val_loss.4f}, Val Acc {epoch_val_acc.4f}") print("-" * 30) print("Finished Training.") # Return performance history return { 'train_losses' train_losses, 'train_accuracies' train_accuracies, 'val_losses' val_losses, 'val_accuracies' val_accuracies } This training function Tracks performance metrics (training and validation) over time. Switches between training (model.train()) and evaluation (model.eval()) modes correctly. Handles device placement for tensors (.to(device)). Implements the full train-validate cycle for each epoch. Returns a dictionary of training and validation history for later analysis. Now, let's start training our model # Train the model num_epochs = 20 history = train_model( model=model, train_loader=train_loader, val_loader=val_loader, criterion=criterion, optimizer=optimizer, device=device, num_epochs=num_epochs ) During training, you'll see output showing the model's progress Starting Training... Epoch 1/20 Train Loss 0.5181, Train Acc 0.8282 Val Loss 0.1428, Val Acc 0.9484 ------------------------------ Epoch 2/20 Train Loss 0.2066, Train Acc 0.9221 Val Loss 0.0897, Val Acc 0.9685 ------------------------------ Epoch 3/20 Train Loss 0.1632, Train Acc 0.9379 Val Loss 0.0708, Val Acc 0.9780 ------------------------------ ... (output for subsequent epochs) ... ------------------------------ Epoch 20/20 Train Loss 0.0832, Train Acc 0.9699 Val Loss 0.0468, Val Acc 0.9819 Finished Training. Training is complete! The output above gives us a snapshot of the loss and accuracy progress for each epoch on both the training and validation sets. We can see the model is learning, but to get a full picture of the trends over all 20 epochs, like how quickly the model converged, whether overfitting occurred, and how the validation performance truly compared to training, we should visualize these metrics instead. So let's plot the history next. Visualizing the Training Process Visualizing the training and validation metrics is the best way to understand how our model is learning. Plotting their loss/accuracy curves over epochs provides valuable insights into the learning dynamics. def plot_training_history(history) """Plots the training and validation loss and accuracy.""" fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5)) # Plot losses ax1.plot(history['train_losses'], label='Train Loss') ax1.plot(history['val_losses'], label='Validation Loss') ax1.set_xlabel('Epoch') ax1.set_ylabel('Loss') ax1.set_title('Training and Validation Loss') ax1.legend() ax1.grid(True) # Plot accuracies ax2.plot(history['train_accuracies'], label='Train Accuracy') ax2.plot(history['val_accuracies'], label='Validation Accuracy') ax2.set_xlabel('Epoch') ax2.set_ylabel('Accuracy') ax2.set_title('Training and Validation Accuracy') ax2.legend() ax2.grid(True) plt.tight_layout() plt.show() # Plot the training and validation history plot_training_history(history) These plots provide a clear visual summary of the entire training process over the 20 epochs. Overall Learning We can clearly see the learning trend both the blue (training) and orange (validation) loss curves decrease significantly from the start and then begin to level off, particularly towards the end. Correspondingly, both accuracy curves rise quickly and plateau at a high level. This confirms the model successfully learned from the data. Validation vs. Training Performance Notice how the orange validation loss curve consistently stays below the blue training loss curve, and the orange validation accuracy curve stays above the blue training accuracy curve. This pattern, where validation metrics appear better than training metrics, is often expected when using regularization techniques like Data Augmentation and Dropout. These techniques are applied only during the training phase (model.train()), making that phase slightly harder, but are turned off during validation (model.eval()), allowing the model's full capacity to be assessed on the consistent validation data. Overfitting Check We visually inspect the gap between the training and validation curves. Signs of significant overfitting would include the validation loss (orange) clearly starting to rise while the training loss (blue) continues to fall, or the validation accuracy stalling/dropping while training accuracy keeps climbing. Based on these plots, while there are minor fluctuations, the validation loss remains low and generally trends downwards or flat near the end. The gap between the curves doesn't appear to be dramatically widening, suggesting significant overfitting hasn't set in within these 20 epochs. Optimal Epoch & Training Duration Looking closely at the orange validation loss curve, it appears to reach its minimum value very late in training, around epoch 19 or 20. Similarly, validation accuracy plateaus at its peak in the last few epochs. This suggests that training for the full 20 epochs was beneficial for this specific run and learning rate, and stopping much earlier might have resulted in slightly suboptimal validation performance. TL;DR The plots show stable training with good convergence over 20 epochs. They visually confirm the expected impact of our training-only regularization (Val > Train metrics) and indicate that the model reached its best validation performance near the end of this training run without showing strong signs of overfitting yet. 3. Evaluating Our Pneumonia Detection CNN After training, we need to rigorously evaluate our model to understand its strengths and weaknesses, especially for a medical application like pneumonia detection. Calculating Key Metrics For medical diagnosis tasks, accuracy alone is insufficient. We need to consider Precision Of all cases predicted as pneumonia, how many actually have pneumonia? Recall Of all actual pneumonia cases, how many did we correctly identify? F1-score The harmonic mean of precision and recall Confusion Matrix A table showing true positives, false positives, true negatives, and false negatives Let's implement a detailed evaluation def evaluate_model(model, test_loader, device, class_names) """ Evaluates the model on a given dataloader (e.g., test set). Computes confusion matrix and classification report. """ model.eval() # Set model to evaluation mode all_preds = [] all_labels = [] with torch.no_grad() # Disable gradient calculation for images, labels in test_loader images, labels = images.to(device), labels.to(device) outputs = model(images) _, predictions = torch.max(outputs, 1) all_preds.extend(predictions.cpu().numpy()) all_labels.extend(labels.cpu().numpy()) all_preds = np.array(all_preds) all_labels = np.array(all_labels) # Calculate confusion matrix cm = confusion_matrix(all_labels, all_preds) # Calculate classification report class_report = classification_report( all_labels, all_preds, target_names=class_names, digits=4, zero_division=0 ) # Calculate overall accuracy from the report accuracy = np.trace(cm) / np.sum(cm) # Simple accuracy from confusion matrix return { 'confusion_matrix' cm, 'classification_report' class_report, 'accuracy' accuracy, 'predictions' all_preds, 'true_labels' all_labels } # Evaluate the model eval_results = evaluate_model(model, test_loader, device, class_names) # Print results print("Classification Report") print(eval_results['classification_report']) print(f"Overall Accuracy {eval_results['accuracy'].4f}") You should see output similar to Classification Report precision recall f1-score support NORMAL 0.9780 0.3803 0.5477 234 PNEUMONIA 0.7280 0.9949 0.8407 390 accuracy 0.7644 624 macro avg 0.8530 0.6876 0.6942 624 weighted avg 0.8217 0.7644 0.7308 624 Overall Accuracy 0.7644 Visualizing the Confusion Matrix A confusion matrix provides a clear visual representation of our model's performance def plot_confusion_matrix(confusion_matrix, class_names) plt.figure(figsize=(8, 6)) sns.heatmap( confusion_matrix, annot=True, fmt='d', cmap='Blues', xticklabels=class_names, yticklabels=class_names ) plt.xlabel('Predicted Label') plt.ylabel('True Label') plt.title('Confusion Matrix') plt.tight_layout() plt.show() # Plot confusion matrix plot_confusion_matrix(eval_results['confusion_matrix'], class_names) The confusion matrix shows True Negatives (top-left) Normal X-rays correctly identified as normal False Positives (top-right) Normal X-rays incorrectly identified as pneumonia False Negatives (bottom-left) Pneumonia X-rays incorrectly identified as normal True Positives (bottom-right) Pneumonia X-rays correctly identified as pneumonia Interpreting Results in a Medical Context In medical diagnosis, different types of errors have different consequences False Negatives (missing pneumonia) These are particularly dangerous as a patient with pneumonia might not receive necessary treatment, potentially leading to serious complications. Minimizing these is often a high priority (i.e., maximizing Recall/Sensitivity for the PNEUMONIA class). False Positives (diagnosing pneumonia when it's absent) These may lead to unnecessary treatment, causing stress and potential side effects, but are generally less immediately harmful than false negatives. Minimizing these relates to maximizing Recall/Specificity for the NORMAL class. Examining our actual results from the Classification Report above, we see Pneumonia Detection (Class 1) The model achieves extremely high Recall (Sensitivity) of ~0.9949. This is excellent, meaning it correctly identifies nearly 99.5% of the actual pneumonia cases in the test set, effectively minimizing dangerous False Negatives. However, its Precision for pneumonia is ~0.7280, meaning that when it predicts pneumonia, it's correct only about 73% of the time – the other 27% are False Positives (NORMAL cases misclassified as PNEUMONIA). Normal Case Detection (Class 0) The model still has very low Recall (Specificity) of ~0.3803. This indicates it only correctly identifies about 38% of the actual normal cases; the remaining 62% are misclassified as pneumonia (contributing to the lower precision for the PNEUMONIA class). The Precision for normal cases remains high (~0.9780), meaning if it predicts normal, it's very likely correct, but this model rarely makes that prediction for normal cases. Interpretation These results indicate the model is significantly biased towards predicting PNEUMONIA. It's highly sensitive but lacks specificity. In a real medical scenario The high sensitivity (~99.5%) is valuable for ensuring potential cases aren't missed. The low specificity (~38%) remains highly problematic, likely leading to a large number of unnecessary follow-ups for healthy individuals. While prioritizing sensitivity is common for screening, this level of specificity would likely be impractical. These results strongly suggest that the class imbalance in our training data is heavily influencing the model's predictions. To create a more balanced and clinically useful model, addressing this imbalance directly (using techniques like weighted loss or resampling, as discussed in Section 5) would be the most logical next step. 4. Visualizing Model Predictions Let's visualize some of our model's predictions to better understand its behavior def visualize_predictions(model, dataloader, device, class_names, num_samples=8) """Displays a batch of test images with their true labels and model predictions.""" model.eval() try images, labels = next(iter(dataloader)) except StopIteration print("DataLoader is empty.") return # Ensure we don't request more samples than available in the batch actual_samples = min(num_samples, images.size(0)) if actual_samples <= 0 print("No samples in batch to display.") return images, labels = images[actual_samples], labels[actual_samples] images_device = images.to(device) # Move input data to the correct device # Get model predictions with torch.no_grad() outputs = model(images_device) _, preds = torch.max(outputs, 1) probs = F.softmax(outputs, dim=1) # Move data back to CPU for plotting preds = preds.cpu().numpy() probs = probs.cpu().numpy() images = images.cpu() # Determine subplot layout if actual_samples <= 4 ncols = actual_samples; nrows = 1; figsize = (4 * ncols, 5) else ncols = 4; nrows = 2; figsize = (16, 10) fig, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=figsize) # Ensure axes is iterable if nrows == 1 and ncols == 1 axes = np.array([axes]) axes = axes.flatten() fig.suptitle("Sample Test Set Predictions", fontsize=16) for i, ax in enumerate(axes) if i < actual_samples img_tensor = images[i] true_label = class_names[labels[i]] pred_label = class_names[preds[i]] confidence = probs[i][preds[i]] # Prepare image for display (C, H, W) -> (H, W, C) img_display = img_tensor.permute(1, 2, 0).numpy() # Display image ax.imshow(img_display.squeeze(), cmap='gray', vmin=0.0, vmax=1.0) # Set title with prediction info and color coding title_color = 'green' if pred_label == true_label else 'red' title = f"True {true_label}Pred {pred_label}Conf {confidence.2f}" ax.set_title(title, color=title_color) ax.axis('off') else ax.axis('off') plt.tight_layout() plt.subplots_adjust(top=0.92) # Adjust layout for suptitle plt.show() # Visualize model predictions on the test set print("Visualizing sample predictions from the test set...") # Create a TEMPORARY DataLoader with shuffling enabled JUST for visualization # This helps ensure we see a mix of classes in the first batch we grab. # The 'test_loader' used for actual evaluation remains unshuffled. temp_vis_loader = DataLoader( dataset=test_dataset, # Use the same test_dataset batch_size=batch_size, # Use the same batch size shuffle=True # Shuffle ON for this temporary loader ) visualize_predictions(model, temp_vis_loader, device, class_names) This visualization provides concrete examples of the model's behavior on the test set We can clearly see examples of both correct (green titles) and incorrect (red titles) predictions made by the model. It allows us to observe the model's confidence for each prediction. Notice in this batch that the confidence scores are generally quite high (often >0.80), even for some of the incorrect classifications. Most importantly, we can identify potential patterns in the errors. In this specific sample batch, the errors primarily consist of True NORMAL images being incorrectly classified as PNEUMONIA, sometimes with high confidence. This visually reinforces the low Specificity (low Recall for the NORMAL class) identified in our quantitative evaluation metrics and highlights the model's tendency to misclassify normal cases. Making Predictions on Random Images Evaluating metrics like precision and recall gives us an overall sense of performance, but looking at individual predictions can provide more intuition. Let's see how our trained model performs on a randomly selected individual X-ray image from the NORMAL class in the test set. First, here's the helper function we'll use to load, preprocess, and get a prediction for a single image path def predict_image(model, image_path, transform, device, class_names) """Loads a single image, preprocesses it, and returns model prediction details.""" try # Load the image using PIL image = Image.open(image_path) except FileNotFoundError print(f"Error Image file not found at {image_path}") return None except Exception as e print(f"Error opening image {image_path} {e}") return None # Preprocess Apply validation/test transforms, add batch dimension, move to device image_tensor = transform(image).unsqueeze(0).to(device) # Make prediction model.eval() # Ensure model is in evaluation mode with torch.no_grad() # Disable gradient calculations output = model(image_tensor) # Output raw logits probabilities = F.softmax(output, dim=1) # Probabilities # Get the highest probability score and the corresponding class index confidence, predicted_class_idx = torch.max(probabilities, 1) # Extract results class_idx = predicted_class_idx.item() class_name = class_names[class_idx] # Map index to class name confidence_score = confidence.item() # Return results as a dictionary return { 'class_id' class_idx, 'class_name' class_name, 'confidence' confidence_score, 'probabilities' probabilities[0].cpu().numpy() # All class probabilities } Now, let's use this function on a random image from the test/NORMAL directory try normal_dir = os.path.join(test_dir, "NORMAL") # Target the NORMAL directory # Get only .jpeg files from the directory normal_test_files = [f for f in os.listdir(normal_dir) if f.lower().endswith('.jpeg')] if not normal_test_files print(f"No NORMAL test images found in {normal_dir}.") else # Select a random image file random_filename = random.choice(normal_test_files) test_image_path = os.path.join(normal_dir, random_filename) print(f"Predicting on random NORMAL image {random_filename}") # Get prediction using the function result = predict_image(model, test_image_path, val_test_transforms, device, class_names) if result # Display the prediction details print(f" Actual class NORMAL") # State the true class print(f" Predicted class {result['class_name']}") print(f" Confidence {result['confidence'].4f}") print(f" Class probabilities Normal={result['probabilities'][0].4f}, Pneumonia={result['probabilities'][1].4f}") # Visualize the image with prediction try img = Image.open(test_image_path) plt.figure(figsize=(6, 6)) plt.imshow(img, cmap='gray') # Include TRUE label in title for clarity plt.title(f"True NORMAL | Prediction {result['class_name']} ({result['confidence'].4f})") plt.axis('off') plt.show() except Exception as e print(f"Error displaying image {test_image_path} {e}") except FileNotFoundError print(f"Error Directory {normal_dir} not found.") except Exception as e print(f"An error occurred during prediction example {e}") Predicting on random NORMAL image IM-0011-0001-0001.jpeg Actual class NORMAL Predicted class PNEUMONIA Confidence 0.7053 Class probabilities Normal=0.2947, Pneumonia=0.7053 Here, we took a random test X-ray image (IM-0011-0001-0001.jpeg) known to be NORMAL. Our model, however, incorrectly predicted it as PNEUMONIA with moderate confidence (approx. 70.5%). This specific misclassification provides a clear example of the main weakness identified in our evaluation metrics the model's difficulty in correctly recognizing NORMAL cases (achieving only ~38.0% recall/specificity according to the Classification Report). Errors like this, where NORMAL images are falsely predicted as PNEUMONIA (False Positives), are why the overall Precision for the PNEUMONIA class was limited to ~72.8%. ****When the model predicts PNEUMONIA, roughly 27% of those predictions are actually NORMAL cases being misclassified. While the model remains excellent at catching actual PNEUMONIA (with ~99.5% recall/sensitivity), this tendency to misclassify NORMAL images highlights the impact of the class imbalance. Looking at the specific image, we can see prominent normal structures (bronchial/vascular markings); it's plausible that the model, biased by the imbalance, struggles to differentiate these complex normal patterns from potential abnormalities. Addressing this bias to improve specificity would clearly improve the model's clinical utility. This leads us nicely into exploring common training issues and techniques to mitigate them. 5. Addressing Common CNN Training Issues Now that we've trained and evaluated our model, we've seen some promising results but also potential areas for improvement (like the low specificity driven by class imbalance). Let's explore common issues encountered during CNN development and strategies to address them, considering our specific pneumonia detection task. Diagnosing and Addressing Overfitting Overfitting occurs when a model learns the training data too well, including its noise and specific quirks, rather than the underlying general patterns. This leads to poor performance on new, unseen data. Signs of overfitting in our training plots would include Training accuracy becoming much higher than validation accuracy. Training loss continuing to decrease significantly while validation loss plateaus or starts increasing. Strategy Early Stopping If you observe validation loss starting to increase (a clear sign of overfitting), one effective strategy is early stopping. Concept Monitor the validation loss after each epoch. Save the model's state whenever the validation loss reaches a new minimum. If the validation loss fails to improve for a predefined number of epochs (e.g., 5 or 10, known as "patience"), stop the training process. Finally, load the saved model state that achieved the best validation loss. Example Application For our project, this would involve modifying the training loop to keep track of the best epoch_val_loss seen so far, saving model.state_dict() at that point, and halting if the loss doesn't improve for the specified patience period. Handling Underfitting Underfitting is the opposite problem the model fails to learn the training data well enough, resulting in poor performance on both the training and validation/test sets. This often suggests the model is too simple or hasn't trained sufficiently. Potential Strategies Increase Model Complexity Make the model more powerful so it can capture more complex patterns. Example Application We could add a fourth convolutional block to our PneumoniaCNN definition or increase the number of output channels in the existing nn.Conv2d layers (e.g., going from 32 -> 64 -> 128 to perhaps 64 -> 128 -> 256). Train Longer Give the model more time to learn by increasing the number of training epochs. Example Application Simply call our train_model function with a larger value, like num_epochs=30 or num_epochs=50, while carefully monitoring for signs of overfitting using the validation metrics. Reduce Regularization Techniques like dropout prevent overfitting but can hinder learning if applied too aggressively when the model is underfitting. Example Application We could try lowering the dropout probability in our fully connected layers, for instance, changing nn.Dropout(p=0.5) to nn.Dropout(p=0.3). If using weight decay in the optimizer, we might reduce its strength. Learning Rate Adjustment The learning rate might be too low (slow learning) or too high (preventing convergence). Experimenting or using a scheduler can help. Example Application We could try initializing the Adam optimizer with a slightly different learning rate, like lr=0.001 or lr=0.005. Alternatively, we could implement a learning rate scheduler (e.g., torch.optim.lr_scheduler.ReduceLROnPlateau) that automatically reduces the learning rate if the validation loss stagnates. Addressing Class Imbalance As our verification step showed, the training set has a roughly 31 ratio of PNEUMONIA to NORMAL samples. This imbalance likely contributed to our model's bias towards predicting PNEUMONIA (high sensitivity, low specificity). Common strategies include Weighted Loss Function Modify the loss calculation to penalize errors on the minority class (NORMAL) more heavily. Example Application Calculate weights inversely proportional to class frequency (e.g., assign a weight of ~3 to the NORMAL class and ~1 to the PNEUMONIA class) and pass these weights to the weight parameter of nn.CrossEntropyLoss when defining our criterion. Resampling Adjust the sampling process during training to create more balanced batches. Example Application Oversampling the minority class involves drawing more samples (with replacement) from the NORMAL images during each epoch, perhaps using PyTorch's WeightedRandomSampler with the DataLoader. Undersampling the majority class involves randomly discarding some PNEUMONIA samples to match the number of NORMAL samples, though this risks losing potentially useful information. Generate Synthetic Data Create artificial examples of the minority class. Example Application This often involves more advanced techniques like SMOTE (Synthetic Minority Over-sampling Technique) or using Generative Adversarial Networks (GANs) to create new, realistic-looking NORMAL X-ray images, though implementing these is beyond the scope of this tutorial. Choosing the right strategy often involves experimentation. For class imbalance, using a weighted loss or resampling via WeightedRandomSampler are often effective starting points. Review and Next Steps Congratulations! You've successfully navigated this two-part tutorial series, journeying from the fundamentals of Convolutional Neural Networks all the way to building, training, and evaluating a practical pneumonia detection model using PyTorch. You've seen the entire workflow, from defining the architecture in Part 1 to preparing data, implementing training loops, interpreting results, and considering common challenges in Part 2. What You've Learned As we wrap up, let's distill the most important concepts and practices covered across both tutorials End-to-End Workflow Building an effective computer vision solution involves a complete pipeline careful data preparation (verification, splitting, augmentation, transformation), thoughtful model architecture definition (often using PyTorch's object-oriented nn.Module structure), implementing robust training loops (managing device placement and model modes), and performing rigorous evaluation tailored to the problem. Data is Foundational The quality and handling of your data are paramount. Accurate verification, appropriate splitting (train/validation/test), deliberate preprocessing choices (like resizing or grayscale conversion), and techniques like data augmentation significantly impact model performance and reliability. Evaluate Beyond Accuracy Especially for real-world applications like medical diagnosis, relying solely on accuracy can be misleading, particularly with imbalanced datasets. Metrics like precision, recall (sensitivity/specificity), F1-score, and confusion matrices provide a much deeper understanding of model strengths and weaknesses for each class. Practical Training Details Matter Correctly switching between model.train() and model.eval() is essential for layers like Dropout and BatchNorm to function properly. Being aware of potential issues like overfitting or class imbalance and knowing strategies to address them (e.g., early stopping, learning rate scheduling, weighted loss, resampling) are key practical skills for refining models. What You Can Try Next Your journey into computer vision with PyTorch doesn't have to end here! To deepen your skills, consider exploring these areas Transfer Learning Instead of training from scratch, leverage powerful models (like ResNet, VGG, DenseNet) pre-trained on large datasets (like ImageNet) and fine-tune them for your specific task. This often leads to better performance with less data and faster training. Cross-Validation Implement k-fold cross-validation for a more robust evaluation of your model's performance, reducing the dependency on a single train-validation split. Hyperparameter Tuning Systematically experiment with different learning rates, batch sizes, optimizer choices, network architectures, or augmentation strategies. Explainability Use techniques like Grad-CAM, SHAP, or LIME to understand why your model makes certain predictions. Visualize the image regions that most influence its decision. This is important for building trust, especially in medical AI. Remember that deep learning is as much an art as it is a science—experimentation, careful analysis, and domain knowledge all play important roles in creating effective solutions. Keep practicing these skills, and you'll be well-equipped to solve real-world problems with computer vision in PyTorch. Additional Resources To continue your learning journey PyTorch Documentation - Comprehensive reference for all PyTorch functions Sequence Models in PyTorch - If you're interested in extending your skills to sequential data Natural Language Processing with PyTorch - For processing text data with PyTorch Medical Image Analysis Papers - Recent research in medical image classification


Put Database Offline sql Server (Move DB)
Category: SQL

This tutorial will show you how to safely move ...


Views: 236 Likes: 76
User Control (ASP.Net)
Category: .Net 7

User Controls</s ...


Views: 388 Likes: 109
Learn Python in 2018
Category: Technology

Python is used widely to develop ...


Views: 294 Likes: 79
InkScape Tutorial
Category: Research

1. Curve the tracer and hit ENTER to create an outline.<iframe src="//www.youtube.com/embed ...


Views: 0 Likes: 31
How to get a User ID in Asp.Net Core 3.1 (ClaimPri ...
Category: .Net 7

Question How do you get a user id in .net core MVC from a service repository?</ ...


Views: 1623 Likes: 100
Asp.Net Core 3.1 Error: FromSqlRaw or FromSqlInter ...
Category: SQL

Asp.Net Core 3.1 Error FromSqlRaw or FromSqlInterpolated was called with non-composable SQL and ...


Views: 1806 Likes: 114
ASP.NET Core 3.1 Get the Path to the Images Folder ...
Category: .Net 7

Problem How do you get the path to the wwwroot folder to show an image without ...


Views: 775 Likes: 112
ASP.Net Core 2.2 Razor View Routing for Partial Vi ...
Category: .Net 7

Question ASP.Net Core 2.2 Razor View Routing for partial view not working Answer Look c ...


Views: 340 Likes: 79
Profile Common Error Asp.Net
Category: .Net 7

<pre style="margin-top 18px; margin-bottom 30px; padding 5px; border-color rgb(225, 226, 226); f ...


Views: 693 Likes: 103
Application startup exception: System.InvalidOpera ...
Category: MVC

Error Application startup exception System.InvalidOperationException Cannot r ...


Views: 865 Likes: 84
AWS Inferentia2 builds on AWS Inferentia1 by delivering 4x higher throughput and 10x lower latency
AWS Inferentia2 builds on AWS Inferentia1 by deliv ...

The size of the machine learning (ML) models––large language models (LLMs) and foundation models (FMs)––is growing fast year-over-year, and these models need faster and more powerful accelerators, especially for generative AI. AWS Inferentia2 was designed from the ground up to deliver higher performance while lowering the cost of LLMs and generative AI inference. In this post, we show how the second generation of AWS Inferentia builds on the capabilities introduced with AWS Inferentia1 and meets the unique demands of deploying and running LLMs and FMs. The first generation of AWS Inferentia, a purpose-built accelerator launched in 2019, is optimized to accelerate deep learning inference. AWS Inferentia helped ML users reduce their inference costs and improve their prediction throughput and latency. With AWS Inferentia1, customers saw up to 2.3x higher throughput and up to 70% lower cost per inference than comparable inference-optimized Amazon Elastic Compute Cloud (Amazon EC2) instances. AWS Inferentia2, featured in the new Amazon EC2 Inf2 instances and supported in Amazon SageMaker, is optimized for large-scale generative AI inference and is the first inference focused instance from AWS that is optimized for distributed inference, with high-speed, low-latency connectivity between accelerators. You can now efficiently deploy a 175-billion-parameter model for inference across multiple accelerators on a single Inf2 instance without requiring expensive training instances. Until now, customers who had large models could only use instances that were built for training, but this is a waste of resources––given that they’re more expensive, consume more energy, and their workload doesn’t make use of all the available resources (such as faster networking and storage). With AWS Inferentia2, you can achieve 4 times higher throughput and up to 10 times lower latency compared to AWS Inferentia1. Also, the second generation of AWS Inferentia adds enhanced support for more data types, custom operators, dynamic tensors, and more. AWS Inferentia2 has 4 times more memory capacity, 16.4 times higher memory bandwidth than AWS Inferentia1, and native support for sharding large models across multiple accelerators. The accelerators use NeuronLink and Neuron Collective Communication to maximize the speed of data transfer between them or between an accelerator and the network adapter. AWS Inferentia2 is better suited for larger models, which require sharding across multiple accelerators, although AWS Inferentia1 is still a great option for smaller models because it provides better price-performance compared to alternatives. Architecture evolution To compare both generations of AWS Inferentia, let’s review the architecture of AWS Inferentia1. It has four NeuronCores v1 per chip, shown in the following diagram. Specifications per chip Compute – Four cores delivering in total 128 INT8 TOPS and 64FP16/BF16 TFLOPS Memory – 8 GB of DRAM (50 GB/sec of bandwidth), shared by all four cores NeuronLink – Link between cores for sharding models across two or more cores Let’s look at how AWS Inferentia2 is organized. Each AWS Inferentia2 chip has two upgraded cores based on the NeuronCore-v2 architecture. Like AWS Inferentia1, you can run different models on each NeuronCore or combine multiple cores to shard big models. Specifications per chip Compute – Two cores delivering in total 380 INT8 TOPS, 190 FP16/BF16/cFP8/TF32 TFLOPS, and 47.5 FP32 TFLOPS Memory – 32 GB of HBM, shared by both cores NeuronLink – Link between chips (384 GB/sec per device) for sharding models across two or more cores NeuronCore-v2 has a modular design with four independent engines ScalarEngine (3 times faster than v1) – Operates on floating point numbers––1600 (BF16/FP16) FLOPS VectorEngine (10 times faster than v1) – Operates on vectors of numbers with single operation for computations such as normalization, pooling, and others. TensorEngine (6 times faster than v1) – Performs tensor computations such as Conv, Reshape, Transpose, and others. GPSIMD-Engine – Has eight fully programmable 512-bit wide general-purpose processors for you to create your custom operators with standard PyTorch custom C++ operators API. This is a new feature, introduced in NeuronCore-v2. AWS Inferentia2 NeuronCore-v2 is faster and more optimized. Also, it’s capable of accelerating different types and sizes of models, ranging from simple models such as ResNet 50 to large language models or foundation models with billions of parameters such as GPT-3 (175 billion parameters). AWS Inferentia2 also has a larger and faster internal memory, when compared to AWS Inferentia1, as shown in the following table. Chip Neuron Cores Memory Type Memory Size Memory Bandwidth AWS Inferentia x4 (v1) DDR4 8GB 50GB/S AWS Inferentia 2 x2 (v2) HBM 32GB 820GB/S The memory you find in AWS Inferentia2 is the type High-Bandwidth Memory (HBM) type. Each AWS Inferentia2 chip has 32 GB and that can be combined with other chips to distribute very large models using NeuronLink (device-to-device interconnect). An inf2.48xlarge, for instance, has 12 AWS Inferentia2 accelerators with a total of 384 GB of accelerated memory. The speed of AWS Inferentia2 memory is 16.4 times faster than AWS Inferentia1, as shown in the previous table. Other features AWS Inferentia2 offers the following additional features Hardware supported – cFP8 (new, configurable FP8), FP16, BF16, TF32, FP32, INT8, INT16 and INT32. For more information, refer to Data Types. Lazy Tensor inference – We discuss Lazy Tensor inference later in this post. Custom operators – Developers can use standard PyTorch custom operators programming interfaces to use the Custom C++ Operators feature. A custom operator is composed of low-level primitives available in the Tensor Factory Functions and accelerated by GPSIMD-Engine. Control-flow (coming soon) – This is for native programming language control flow inside the model to eventually preprocess and postprocess data from one layer to another. Dynamic-shapes (coming soon) – This is useful when your model changes the shape of the output of any internal layer dynamically. For instance a filter which reduces the output tensor size or shape inside the model, based on the input data. Accelerating models on AWS Inferentia1 and AWS Inferentia2 The AWS Neuron SDK is used for compiling and running your model. It is natively integrated with PyTorch and TensorFlow. That way, you don’t need to run an additional tool. Use your original code, written in one of these ML frameworks, and with a few lines of code changes, you’re good to go with AWS Inferentia. Let’s look at how to compile and run a model on AWS Inferentia1 and AWS Inferentia2 using PyTorch. Load a pre-trained model (ResNet 50) from torchvision Load a pre-trained model and run it one time to warm it up import torch import torchvision model = torchvision.models.resnet50(weights='IMAGENET1K_V1').eval().cpu() x = torch.rand(1,3,224,224).float().cpu() # dummy input y = model(x) # warmup model Trace and deploy the accelerated model on Inferentia1 To trace the model to AWS Inferentia, import torch_neuron and invoke the tracing function. Keep in mind that the model needs to be PyTorch Jit traceable to work. At the end of the tracing process, save the model as a normal PyTorch model. Compile the model one time and load it back as many times as you need. The Neuron SDK runtime is already integrated to PyTorch and is responsible for sending the operators to the AWS Inferentia1 chip automatically to accelerate your model. In your inference code, you always need to import torch_neuron to activate the integrated runtime. You can pass additional parameters to the compiler to customize the way it optimizes the model or to enable special features such as neuron-pipeline-cores. Shard your model across multiple cores to increase throughput. import torch_neuron # Tracing the model using AWS NeuronSDK neuron_model = torch_neuron.trace(model,x) # trace model to Inferentia # Saving for future use neuron_model.save('neuron_resnet50.pt') # Next time you don't need to trace the model again # Just load it and AWS NeuronSDK will send it to Inferentia automatically neuron_model = torch.jit.load('neuron_resnet50.pt') # accelerated inference on Inferentia y = neuron_model(x) Tracing and deploying the accelerated model on Inferentia2 For AWS Inferentia2, the process is similar. The only difference is the package you import ends with x torch_neuronx. The Neuron SDK takes care of the compilation and running of the model for you transparently. You can also pass additional parameters to the compiler to fine-tune the operation or activate specific functionalities. import torch_neuronx # Tracing the model using NeuronSDK neuron_model = torch_neuronx.trace(model,x) # trace model to Inferentia # Saving for future use neuron_model.save('neuron_resnet50.pt') # Next time you don't need to trace the model again # Just load it and NeuronSDK will send it to Inferentia automatically neuron_model = torch.jit.load('neuron_resnet50.pt') # accelerated inference on Inferentia y = neuron_model(x) AWS Inferentia2 also offers a second approach for running a model called Lazy Tensor inference. In this mode, you don’t trace or compile the model previously; instead, the compiler runs on the fly every time you run your code. It isn’t recommended for production, given that traced mode has many advantages over Lazy Tensor inference. However, if you’re still developing your model and need to test it faster, Lazy Tensor inference can be a good alternative. Here’s how to compile and run a model using Lazy Tensor import torch import torchvision import torch_neuronx import torch_xla.core.xla_model as xm device = xm.xla_device() # Create XLA device model = torchvision.models.resnet50(weights='IMAGENET1K_V1').eval().cpu() model.to(device) x = torch.rand((1,3,224,224), device=device) # dummy input with torch.no_grad() y = model(x) xm.mark_step() # Compilation occurs here Now that you’re familiar with AWS Inferentia2, a good next step is to get started with PyTorch or Tensorflow and learn how to set up a dev environment and run tutorials and examples. Also, check the AWS Neuron Samples GitHub repo, where you can find multiple examples of how to prepare models to run on Inf2, Inf1, and Trn1. Summary of feature comparison between AWS Inferentia1 and AWS Inferentia2 The AWS Inferentia2 compiler is XLA-based, and AWS is part of OpenXLA initiative. This is the biggest difference over AWS Inferentia1, and that’s relevant because PyTorch, TensorFlow, and JAX have native XLA integrations. XLA brings many performance improvements, given that it optimizes the graph to compute the results in a single kernel launch. It fuses together successive tensor operations and outputs optimal machine code for accelerating model runs on AWS Inferentia2. Other parts of the Neuron SDK were also improved in AWS Inferentia2, while keeping the user experience as simple as possible while tracing and running models. The following table shows the features available in both versions of the compiler and runtime. Feature torch-neuron torch-neuronx Tensorboard Yes Yes Supported Instances Inf1 Inf2 & Trn1 Inference Support Yes Yes Training Support No Yes Architecture NeuronCore-v1 NeuronCore-v2 Trace API torch_neuron.trace() torch_neuronx.trace() Distributed inference NeuronCore Pipeline Collective Communications IR GraphDef HLO Compiler neuron-cc neuronx-cc Monitoring neuron-monitor / monitor-top neuron-monitor / monitor-top For a more detailed comparison between torch-neuron (Inf1) and torch-neuronx (Inf2), refer to Comparison of torch-neuron (Inf1) versus torch-neuronx (Inf2 & Trn1) for Inference. Model Serving After tracing a model to deploy to Inf2, you have many deployment options. You can run real-time predictions or batch predictions in different ways. Inf2 is available because EC2 instances are natively integrated to other AWS services that make use of Deep Learning Containers (DLCs) such as Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (Amazon EKS), and SageMaker. AWS Inferentia2 is compatible with the most popular deployment technologies. Here are a list of some of the options you have for deploying models using AWS Inferentia2 SageMaker – Fully managed service to prepare data and build, train, and deploy ML models TorchServe – PyTorch integrated deployment mechanism TensorFlow Serving – TensorFlow integrated deployment mechanism Deep Java Library – Open-source Java mechanism for model deployment and training Triton – NVIDIA open-source service for model deployment Benchmark The following table highlights the improvements AWS Inferentia2 brings over AWS Inferentia1. Specifically, we measure latency (how fast the model can make a prediction using each accelerator), throughput (how many inferences per second), and cost per inference (how much each inference costs in US dollars). The lower the latency in milliseconds and costs in US dollars, the better. The higher the throughput the better. Two models were used in this process––both large language models ELECTRA large discriminator and BERT large uncased. PyTorch (1.13.1) and Hugging Face transformers (v4.7.0), the main libraries used in this experiment, ran on Python 3.8. After compiling the models for batch size = 1 and 10 (using the code from the previous section as a reference), each model was warmed up (invoked one time to initialize the context) and then invoked 10 times in a row. The following table shows average numbers collected in this simple benchmark. Electra large discriminator (334,092,288 parameters ~593 MB) Bert large uncased (335,143,938 parameters ~580 MB) OPT-66B (66 billion parameterss ~124 GB) Model Name Batch Size Sentence Length Latency (ms) Improvements Inf2 over Inf1 (x Times) Throughput (Inferences per Second) Cost per Inference (EC2 us-east-1) ** Inf1 Inf2 Inf1 Inf2 Inf1 Inf2 ElectraLargeDiscriminator 1 256 35.7 8.31 4.30 28.01 120.34 $0.0000023 $0.0000018 ElectraLargeDiscriminator 10 256 343.7 72.9 4.71 2.91 13.72 $0.0000022 $0.0000015 BertLargeUncased 1 128 28.2 3.1 9.10 35.46 322.58 $0.0000018 $0.0000007 BertLargeUncased 10 128 121.1 23.6 5.13 8.26 42.37 $0.0000008 $0.0000005 * c6a.8xlarge with 32 AMD Epyc 7313 CPU was used in this benchmark. **EC2 Public pricing in us-east-1 on April 20 inf2.xlarge $0.7582/hr; inf1.xlarge $0.228/hr. Cost per inference considers the cost per element in a batch. (Cost per inference equals the total cost of model invocation/batch size.) For additional information about training and inference performance, refer to Trn1/Trn1n Performance. Conclusion AWS Inferentia2 is a powerful technology designed for improving performance and reducing costs of deep learning model inference. More performant than AWS Inferentia1, it offers up to 4 times higher throughput, up to 10 times lower latency, and up to 50% better performance/watt than other comparable inference-optimized EC2 instances. In the end, you pay less, have a faster application, and meet your sustainability goals. It’s simple and straightforward to migrate your inference code to AWS Inferentia2, which also supports a broader variety of models, including large language models and foundation models for generative AI. You can get started by following the AWS Neuron SDK documentation to set up a development environment and start your accelerated deep learning project. To help you get started, Hugging Face has added Neuron support to their Optimum library, which optimizes models for faster training and inference, and they have many examples tasks ready to run on Inf2. Also, check our Deploy large language models on AWS Inferentia2 using large model inference containers to learn about deploying LLMs to AWS Inferentia2 using model inference containers. For additional examples, see the AWS Neuron Samples GitHub repo. About the authors Samir Araújo is an AI/ML Solutions Architect at AWS. He helps customers creating AI/ML solutions which solve their business challenges using AWS. He has been working on several AI/ML projects related to computer vision, natural language processing, forecasting, ML at the edge, and more. He likes playing with hardware and automation projects in his free time, and he has a particular interest for robotics.


ASP.Net Core 2.2 MVC Error g.cshtml.cs: The type o ...
Category: HTML5

ASP.Net Core 2.2 MVC Err ...


Views: 5443 Likes: 138
Asp.Net Core MVC Razor View Form not posting data ...
Category: .Net 7

Problem&nbsp;Asp.Net Core MVC Razor View Form not posting data to the Server-SideS ...


Views: 250 Likes: 83
Debugging ASP.NET Core Applications: Accessing Kes ...
Category: .Net 7

How to Access and Debug an ASP.NET Core Application Running in Kestrel from Another Computer on t ...


Views: 2061 Likes: 110
Asp.Net MVC Development Notes
Category: .Net 7

<a href="https//www.freecodecamp.org/news/an-awesome-guide-on-how-to-build-restful-apis-w ...


Views: 752 Likes: 79
[Solved]: Asp.Net Core Error 'byte[]' does not co ...
Category: .Net 7

Problem 'byte[]' does not contain a definition for 'CopyToAsync' and no accessible extension met ...


Views: 734 Likes: 77
Asp.Net Core Error: InvalidOperationException: The ...
Category: .Net 7

<span style="color #222222; font-family 'Segoe UI', Tahoma, Arial, Helvetica, sans-serif; font- ...


Views: 1640 Likes: 85
File Archiving and Compression Commands on Linux
File Archiving and Compression Commands on Linux

In this tutorial, we will show you how to archive and compress files on Linux. Archives are a collection of files and directories that are stored in a single file. You should create archives when you want to store and preserve files that you rarely use or when you want to create portable backups. Moreover, it is common to compress the archives in order to reduce the file size, preserve storage space and speed up the data transfer and reduce the bandwidth load. Let’s dive into some examples. Assume that you have the following structure of folders and subfolders that contain files. tree . +-- myfolder +-- mysubfolder1 ¦ +-- mytxt1.txt +-- mysubfolder2 +-- mytxt1.txt We can archive the myfolder by running the following command tar -cf myfolder.tar myfolder Note that the “c” flag means to create a new archive and the “f” flag means to interpret the input from the file rather than from the default, which is standard input. Now, if we run the command ls -ltr total 16 drwxr-sr-x 4 theia users 4096 Feb 9 0607 myfolder -rw-r--r-- 1 theia users 10240 Feb 9 0643 myfolder.tar We will see that the myfolder.tar has been created. If we want to archive the folder as well as compress it, we can add the z option like tar -czf myfolder.tar.gz myfolder Again, we can see that the myfolder.tar.gz has been created. ls -ltr total 20 drwxr-sr-x 4 theia users 4096 Feb 9 0607 myfolder -rw-r--r-- 1 theia users 10240 Feb 9 0643 myfolder.tar -rw-r--r-- 1 theia users 206 Feb 9 0646 myfolder.tar.gz We have also the option to see the content of an archive by running tar -tf myfolder.tar myfolder/ myfolder/mysubfolder1/ myfolder/mysubfolder1/mytxt1.txt myfolder/mysubfolder2/ myfolder/mysubfolder2/mytxt1.txt We can unarchive an archive by running the command tar -xf myfolder.tar Similarly, we can decompress a .tar.gz file by running the same command tar -xf myfolder.tar.gz We can compress a directory using the zip command as follows zip -r myfolder.zip myfolder total 28 drwxr-sr-x 4 theia users 4096 Feb 9 0607 myfolder -rw-r--r-- 1 theia users 10240 Feb 9 0643 myfolder.tar -rw-r--r-- 1 theia users 206 Feb 9 0646 myfolder.tar.gz -rw-r--r-- 1 theia users 168 Feb 9 0707 myfolder.zip And we can decompress a zip file by running the unzip command unzip myfolder.zip


Understanding ASP.Net Core 2.2 Database Context C ...
Category: .Net 7

When working in asp.net core 2.2, you will come across the class in the Model folder called Datab ...


Views: 418 Likes: 86
DNS Host File Not Working (ASP.Net Core 3.1)
Category: .Net 7

Problem When developing for Web, you might be in a situetion where you need to ...


Views: 579 Likes: 89
Microsoft.Common.CurrentVersion.targets(4678,5): e ...
Category: .Net 7

Question How do you resolve the error "Severity Code De ...


Views: 0 Likes: 34
What’s new with identity in .NET 8
What’s new with identity in .NET 8

In April 2023, I wrote about the commitment by the ASP.NET Core team to improve authentication, authorization, and identity management in .NET 8. The plan we presented included three key deliverables New APIs to simplify login and identity management for client apps like Single Page Apps (SPA) and Blazor WebAssembly. Enablement of token-based authentication and authorization in ASP.NET Core Identity for clients that can’t use cookies. Improvements to documentation. All three deliverables will ship with .NET 8. In addition, we were able to add a new identity UI for Blazor web apps that works with both of the new rendering modes, server and WebAssembly. Let’s look at a few scenarios that are enabled by the new changes in .NET 8. In this blog post we’ll cover Securing a simple web API backend Using the new Blazor identity UI Adding an external login like Google or Facebook Securing Blazor WebAssembly apps using built-in features and components Using tokens for clients that can’t use cookies Let’s look at the simplest scenario for using the new identity features. Basic Web API backend An easy way to use the new authorization is to enable it in a basic Web API app. The same app may also be used as the backend for Blazor WebAssembly, Angular, React, and other Single Page Web apps (SPA). Assuming you’re starting with an ASP.NET Core Web API project in .NET 8 that includes OpenAPI, you can add authentication with a few steps. Identity is “opt-in,” so there are a few packages to add Microsoft.AspNetCore.Identity.EntityFrameworkCore – the package that enables EF Core integration A package for the database you wish to use, such as Microsoft.EntityFrameworkCore.SqlServer (we’ll use the in-memory database for this example) You can add these packages using the NuGet package manager or the command line. For example, to add the packages using the command line, navigate to the project folder and run the following dotnet commands dotnet add package Microsoft.AspNetCore.Identity.EntityFrameworkCore dotnet add package Microsoft.EntityFrameworkCore.InMemory Identity allows you to customize both the user information and the user database in case you have requirements beyond what is provided in the .NET Core framework. For our basic example, we’ll just use the default user information and database. To do that, we’ll add a new class to the project called MyUser that inherits from IdentityUser class MyUser IdentityUser {} Add a new class called AppDbContext that inherits from IdentityDbContext<MyUser> class AppDbContext(DbContextOptions<AppDbContext> options) IdentityDbContext<MyUser>(options) { } Providing the special constructor makes it possible to configure the database for different environments. To set up identity for an app, open the Program.cs file. Configure identity to use cookie-based authentication and to enable authorization checks by adding the following code after the call to WebApplication.CreateBuilder(args) builder.Services.AddAuthentication(IdentityConstants.ApplicationScheme) .AddIdentityCookies(); builder.Services.AddAuthorizationBuilder(); Configure the EF Core database. Here we’ll use the in-memory database and name it “AppDb.” It’s usedhere for the demo so it is easy to restart the application and test the flow to register and login (each run will start with a fresh database). Changing to SQLite will save users between sessions but requires the database to be properly created through migrations as shown in this EF Core getting started tutorial. You can use other relational providers such as SQL Server for your production code. builder.Services.AddDbContext<AppDbContext>( options => options.UseInMemoryDatabase("AppDb")); Configure identity to use the EF Core database and expose the identity endpoints builder.Services.AddIdentityCore<MyUser>() .AddEntityFrameworkStores<AppDbContext>() .AddApiEndpoints(); Map the routes for the identity endpoints. This code should be placed after the call to builder.Build() app.MapIdentityApi<MyUser>(); The app is now ready for authentication and authorization! To secure an endpoint, use the .RequireAuthentication() extension method where you define the auth route. If you are using a controller-based solution, you can add the [Authorize] attribute to the controller or action. To test the app, run it and navigate to the Swagger UI. Expand the secured endpoint, select try it out, and select execute. The endpoint is reported as 404 - not found, which is arguably more secure than reporting a 401 - not authorized because it doesn’t reveal that the endpoint exists. Now expand /register and fill in your credentials. If you enter an invalid email address or a bad password, the result includes the validation errors. The errors in this example are returned in the ProblemDetails format so your client can easily parse them and display validation errors as needed. I’ll show an example of that in the standalone Blazor WebAssembly app. A successful registration results in a 200 - OK response. You can now expand /login and enter the same credentials. Note, there are additional parameters that aren’t needed for this example and can be deleted. Be sure to set useCookies to true. A successful login results in a 200 - OK response with a cookie in the response header. Now you can rerun the secured endpoint and it should return a valid result. This is because cookie-based authentication is securely built-in to your browser and “just works.” You’ve just secured your first endpoint with identity! Some web clients may not include cookies in the header by default. If you are using a tool for testing APIs, you may need to enable cookies in the settings. The JavaScript fetch API does not include cookies by default. You can enable them by setting credentials to the value include in the options. Similarly, an HttpClient running in a Blazor WebAssembly app needs the HttpRequestMessage to include credentials, like the following request.SetBrowserRequestCredential(BrowserRequestCredentials.Include); Next, let’s jump into a Blazor web app. The Blazor identity UI A stretch goal of our team that we were able to achieve was to implement the identity UI, which includes options to register, log in, and configure multi-factor authentication, in Blazor. The UI is built into the template when you select the “Individual accounts” option for authentication. Unlike the previous version of the identity UI, which was hidden unless you wanted to customize it, the template generates all of the source code so you can modify it as needed. The new version is built with Razor components and works with both server-side and WebAssembly Blazor apps. The new Blazor web model allows you to configure whether the UI is rendered server-side or from a client running in WebAssembly. When you choose the WebAssembly mode, the server will still handle all authentication and authorization requests. It will also generate the code for a custom implementation of AuthenticationStateProvider that tracks the authentication state. The provider uses the PersistentComponentState class to pre-render the authentication state and persist it to the page. The PersistentAuthenticationStateProvider in the client WebAssembly app uses the component to synchronize the authentication state between the server and browser. The state provider might also be named PersistingRevalidatingAuthenticationStateProvider when running with auto interactivity or IdentityRevalidatingAuthenticationStateProvider for server interactivity. Although the examples in this blog post are focused on a simple username and password login scenario, ASP.NET Identity has support for email-based interactions like account confirmation and password recovery. It is also possible to configure multifactor authentication. The components for all of these features are included in the UI. Add an external login A common question we are asked is how to integrate external logins through social websites with ASP.NET Core Identity. Starting from the Blazor web app default project, you can add an external login with a few steps. First, you’ll need to register your app with the social website. For example, to add a Twitter login, go to the Twitter developer portal and create a new app. You’ll need to provide some basic information to obtain your client credentials. After creating your app, navigate to the app settings and click “edit” on authentication. Specify “native app” for the application type for the flow to work correctly and turn on “request email from users.” You’ll need to provide a callback URL. For this example, we’ll use https//localhost5001/signin-twitter which is the default callback URL for the Blazor web app template. You can change this to match your app’s URL (i.e. replace 5001 with your own port). Also note the API key and secret. Next, add the appropriate authentication package to your app. There is a community-maintained list of OAuth 2.0 social authentication providers for ASP.NET Core with many options to choose from. You can mix multiple external logins as needed. For Twitter, I’ll add the AspNet.Security.OAuth.Twitter package. From a command prompt in the root directory of the server project, run this command to store your API Key (client ID) and secret. dotnet user-secrets set "TwitterApiKey" "<your-api-key>" dotnet user-secrets set "TWitterApiSecret" "<your-api-secret>" Finally, configure the login in Program.cs by replacing this code builder.Services.AddAuthentication(IdentityConstants.ApplicationScheme) .AddIdentityCookies(); with this code builder.Services.AddAuthentication(IdentityConstants.ApplicationScheme) .AddTwitter(opt => { opt.ClientId = builder.Configuration["TwitterApiKey"]!; opt.ClientSecret = builder.Configuration["TwitterApiSecret"]!; }) .AddIdentityCookies(); Cookies are the preferred and most secure approach for implementing ASP.NET Core Identity. Tokens are supported if needed and require the IdentityConstants.BearerScheme to be configured. The tokens are proprietary and the token-based flow is intended for simple scenarios so it does not implement the OAuth 2.0 or OIDC standards. What’s next? Believe it or not, you’re done. This time when you run the app, the login page will automatically detect the external login and provide a button to use it. When you log in and authorize the app, you will be redirected back and authenticated. Securing Blazor WebAssembly apps A major motivation for adding the new identity APIs was to make it easier for developers to secure their browser-based apps including Single Page Apps (SPA) and Blazor WebAssembly. It doesn’t matter if you use the built-in identity provider, a custom login or a cloud-based service like Microsoft Entra, the end result is an identity that is either authenticated with claims and roles, or not authenticated. In Blazor, you can secure a razor component by adding the [Authorize] attribute to the component or to the page that hosts the component. You can also secure a route by adding the .RequireAuthorization() extension method to the route definition. The full source code for this example is available in the Blazor samples repo. The AuthorizeView tag provides a simple way to handle content the user has access to. The authentication state can be accessed via the context property. Consider the following <p>Welcome to my page!</p> <AuthorizeView> <Authorizing> <div class="alert alert-info">We're checking your credentials...</div> </Authorizing> <Authorized> <div class="alert alert-success">You are authenticated @context.User.Identity?.Name</div> </Authorized> <NotAuthorized> <div class="alert alert-warning">You are not authenticated!</div> </NotAuthorized> </AuthorizeView> The greeting will be shown to everyone. In the case of Blazor WebAssembly, when the client might need to authenticate asynchronously over API calls, the Authorizing content will be shown while the authentication state is queried and resolved. Then, based on whether or not you’ve authenticated, you’ll either see your name or a message that you’re not authenticated. How exactly does the client know if you’re authenticated? That’s where the AuthenticationStateProvider comes in. The App.razor page is wrapped in a CascadingAuthenticationState provider. This provider is responsible for tracking the authentication state and making it available to the rest of the app. The AuthenticationStateProvider is injected into the provider and used to track the state. The AuthenticationStateProvider is also injected into the AuthorizeView component. When the authentication state changes, the provider notifies the AuthorizeView component and the content is updated accordingly. First, we want to make sure that API calls are persisting credentials accordingly. To do that, I created a handler named CookieHandler. public class CookieHandler DelegatingHandler { protected override Task<HttpResponseMessage> SendAsync( HttpRequestMessage request, CancellationToken cancellationToken) { request.SetBrowserRequestCredentials(BrowserRequestCredentials.Include); return base.SendAsync(request, cancellationToken); } } In Program.cs I added the handler to the HttpClient and used the client factory to configure a special client for authentication purposes. builder.Services.AddTransient<CookieHandler>(); builder.Services.AddHttpClient( "Auth", opt => opt.BaseAddress = new Uri(builder.Configuration["AuthUrl"]!)) .AddHttpMessageHandler<CookieHandler>(); Note the authentication components are opt-in and available via the Microsoft.AspNetCore.Components.WebAssembly.Authentication package. The client factory and extension methods come from Microsoft.Extensions.Http. The AuthUrl is the URL of the ASP.NET Core server that exposes the identity APIs. Next, I created a CookieAuthenticationStateProvider that inherits from AuthenticationStateProvider and overrides the GetAuthenticationStateAsync method. The main logic looks like this var unauthenticated = new ClaimsPrincipal(new ClaimsIdentity()); var userResponse = await _httpClient.GetAsync("manage/info"); if (userResponse.IsSuccessStatusCode) { var userJson = await userResponse.Content.ReadAsStringAsync(); var userInfo = JsonSerializer.Deserialize<UserInfo>(userJson, jsonSerializerOptions); if (userInfo != null) { var claims = new List<Claim> { new(ClaimTypes.Name, userInfo.Email), new(ClaimTypes.Email, userInfo.Email) }; var id = new ClaimsIdentity(claims, nameof(CookieAuthenticationStateProvider)); user = new ClaimsPrincipal(id); } } return new AuthenticationState(user); The user info endpoint is secure, so if the user is not authenticated the request will fail and the method will return an unauthenticated state. Otherwise, it builds the appropriate identity and claims and returns the authenticated state. How does the app know when the state has changed? Here is what a login looks like from Blazor WebAssembly using the identity API async Task<AuthenticationState> LoginAndGetAuthenticationState() { var result = await _httpClient.PostAsJsonAsync( "login?useCookies=true", new { email, password }); return await GetAuthenticationStateAsync(); } NotifyAuthenticationStateChanged(LoginAndGetAuthenticationState()); When the login is successful, the NotifyAuthenticationStateChanged method on the base AuthenticationStateProvider class is called to notify the provider that the state has changed. It is passed the result of the request for a new authentication state so that it can verify the cookie is present. The provider will then update the AuthorizeView component and the user will see the authenticated content. Tokens In the rare event your client doesn’t support cookies, the login API provides a parameter to request tokens. An custom token (one that is proprietary to the ASP.NET Core identity platform) is issued that can be used to authenticate subsequent requests. The token is passed in the Authorization header as a bearer token. A refresh token is also provided. This allows your application to request a new token when the old one expires without forcing the user to log in again. The tokens are not standard JSON Web Tokens (JWT). The decision behind this was intentional, as the built-in identity is meant primarily for simple scenarios. The token option is not intended to be a fully-featured identity service provider or token server, but instead an alternative to the cookie option for clients that can’t use cookies. Not sure whether you need a token server or not? Read a document to help you choose the right ASP.NET Core identity solution. Looking for a more advanced identity solution? Read our list of identity management solutions for ASP.NET Core. Docs and samples The third deliverable is documentation and samples. We have already introduced new documentation and will be adding new articles and samples as we approach the release of .NET 8. Follow Issue #29452 – documentation and samples for identity in .NET 8 to track the progress. Please use the issue to communicate additional documentation or samples you are looking for. You can also link to the specific issues for various documents and provide your feedback there. Conclusion The new identity features in .NET 8 make it easier than ever to secure your applications. If your requirements are simple, you can now add authentication and authorization to your app with a few lines of code. The new APIs make it possible to secure your Web API endpoints with cookie-based authentication and authorization. There is also a token-based option for clients that can’t use cookies. Learn more about the new identity features in the ASP.NET Core documentation. The post What’s new with identity in .NET 8 appeared first on .NET Blog.


ASP.NET Tracing
Category: .Net 7

How to Enable Tracing when Developing ...


Views: 323 Likes: 110
.NET 8 Performance Improvements in .NET MAUI
.NET 8 Performance Improvements in .NET MAUI

The major focus for .NET MAUI in the .NET 8 release is quality. As such, alot of our focus has been fixing bugs instead of chasing lofty performance goals. In .NET 8, we merged 1,559 pull requests that closed 596 total issues. These include changes from the .NET MAUI team as well as the .NET MAUI community. We are optimistic that this should result in a significant increase in quality in .NET 8. However! We still have plenty of performance changes to showcase. Building upon the fundamental performance improvements in .NET 8 we discover “low-hanging” fruit constantly, and there were high-voted performance issues on GitHub we tried to tackle. Our goal is to continue to make .NET MAUI faster in each release, read on for details! For a review of the performance improvements in past releases, see our posts for .NET 6 and 7. This also gives you an idea of the improvements you would see migrating from Xamarin.Forms to .NET MAUI .NET 7 Performance Improvements in .NET MAUI .NET 6 Performance Improvements in .NET MAUI Table Of Contents New features AndroidStripILAfterAOT AndroidEnableMarshalMethods NativeAOT on iOS Build & Inner Loop Performance Filter Android ps -A output with grep Port WindowsAppSDK usage of vcmeta.dll to C# Improvements to remote iOS builds on Windows Improvements to Android inner-loop XAML Compilation no longer uses LoadInSeparateAppDomain Performance or App Size Improvements Structs and IEquatable in .NET MAUI Fix performance issue in {AppThemeBinding} Address CA1307 and CA1309 for performance Address CA1311 for performance Remove unused ViewAttachedToWindow event on Android Remove unneeded System.Reflection for {Binding} Use StringComparer.Ordinal for Dictionary and HashSet Reduce Java interop in MauiDrawable on Android Improve layout performance of Label on Android Reduce Java interop calls for controls in .NET MAUI Improve performance of Entry.MaxLength on Android Improve memory usage of CollectionView on Windows Use UnmanagedCallersOnlyAttribute on Apple platforms Faster Java interop for strings on Android Faster Java interop for C# events on Android Use Function Pointers for JNI Removed Xamarin.AndroidX.Legacy.Support.V4 Deduplication of generics on iOS and macOS Fix System.Linq.Expressions implementation on iOS-like platforms Set DynamicCodeSupport=false for iOS and Catalyst Memory Leaks Memory Leaks and Quality Diagnosing leaks in .NET MAUI Patterns that cause leaks C# events Circular references on Apple platforms Roslyn analyzer for Apple platforms Tooling and Documentation Simplified dotnet-trace and dotnet-dsrouter dotnet-gcdump Support for Mobile New Features AndroidStripILAfterAOT Once Upon A Time we had a brilliant thought if AOT pre-compiles C# methods, do we need the managed method anymore? Removing the C# method body would allow assemblies to be smaller. .NET iOS applications already do this, so why not Android as well? While the idea is straightforward, implementation was not iOS uses “Full” AOT, which AOT’s all methods into a form that doesn’t require a runtime JIT. This allowed iOS to run cil-strip, removing all method bodies from all managed types. At the time, Xamarin.Android only supported “normal” AOT, and normal AOT requires a JIT for certain constructs such as generic types and generic methods. This meant that attempting to run cil-strip would result in runtime errors if a method body was removed that was actually required at runtime. This was particularly bad because cil-strip could only remove all method bodies! We are re-intoducing IL stripping for .NET 8. Add a new $(AndroidStripILAfterAOT) MSBuild property. When true, the <MonoAOTCompiler/> task will track which method bodies were actually AOT’d, storing this information into %(_MonoAOTCompiledAssemblies.MethodTokenFile), and the new <ILStrip/> task will update the input assemblies, removing all method bodies that can be removed. By default enabling $(AndroidStripILAfterAOT) will override the default $(AndroidEnableProfiledAot) setting, allowing all trimmable AOT’d methods to be removed. This choice was made because $(AndroidStripILAfterAOT) is most useful when AOT-compiling your entire application. Profiled AOT and IL stripping can be used together by explicitly setting both within the .csproj, but with the only benefit being a small .apk size improvement <PropertyGroup Condition=" '$(Configuration)' == 'Release' "> <AndroidStripILAfterAOT>true</AndroidStripILAfterAOT> <AndroidEnableProfiledAot>true</AndroidEnableProfiledAot> </PropertyGroup> .apk size results for a dotnet new android app $(AndroidStripILAfterAOT) $(AndroidEnableProfiledAot) .apk size true true 7.7MB true false 8.1MB false true 7.7MB false false 8.4MB Note that AndroidStripILAfterAOT=false and AndroidEnableProfiledAot=true is the default Release configuration environment, for 7.7MB. A project that only sets AndroidStripILAfterAOT=true implicitly sets AndroidEnableProfiledAot=false, resulting in an 8.1MB app. See xamarin-android#8172 and dotnet/runtime#86722 for details about this feature. AndroidEnableMarshalMethods .NET 8 introduces a new experimental setting for Release configurations <PropertyGroup Condition=" '$(Configuration)' == 'Release' "> <AndroidEnableMarshalMethods>true</AndroidEnableMarshalMethods> <!-- Note that single-architecture apps will be most successful --> <RuntimeIdentifier>android-arm64</RuntimeIdentifier> </PropertyGroup> We hope to enable this feature by default in .NET 9, but for now we are providing the setting as an opt-in, experimental feature. Applications that only target one architecture, such as RuntimeIdentifier=android-arm64, will likely be able to enable this feature without issue. Background on Marshal Methods A JNI marshal method is a JNI-callable function pointer provided to JNIEnvRegisterNatives(). Currently, JNI marshal methods are provided via the interaction between code we generate and JNINativeWrapper.CreateDelegate() Our code-generator emits the “actual” JNI-callable method. JNINativeWrapper.CreateDelegate() uses System.Reflection.Emit to wrap the method for exception marshaling. JNI marshal methods are needed for all Java-to-C# transitions. Consider the virtual Activity.OnCreate() method partial class Activity { static Delegate? cb_onCreate_Landroid_os_Bundle_; static Delegate GetOnCreate_Landroid_os_Bundle_Handler () { if (cb_onCreate_Landroid_os_Bundle_ == null) cb_onCreate_Landroid_os_Bundle_ = JNINativeWrapper.CreateDelegate ((_JniMarshal_PPL_V) n_OnCreate_Landroid_os_Bundle_); return cb_onCreate_Landroid_os_Bundle_; } static void n_OnCreate_Landroid_os_Bundle_ (IntPtr jnienv, IntPtr native__this, IntPtr native_savedInstanceState) { var __this = globalJava.Lang.Object.GetObject<Android.App.Activity> (jnienv, native__this, JniHandleOwnership.DoNotTransfer)!; var savedInstanceState = globalJava.Lang.Object.GetObject<Android.OS.Bundle> (native_savedInstanceState, JniHandleOwnership.DoNotTransfer); __this.OnCreate (savedInstanceState); } // Metadata.xml XPath method reference path="/api/package[@name='android.app']/class[@name='Activity']/method[@name='onCreate' and count(parameter)=1 and parameter[1][@type='android.os.Bundle']]" [Register ("onCreate", "(Landroid/os/Bundle;)V", "GetOnCreate_Landroid_os_Bundle_Handler")] protected virtual unsafe void OnCreate (Android.OS.Bundle? savedInstanceState) => ... } Activity.n_OnCreate_Landroid_os_Bundle_() is the JNI marshal method, responsible for marshaling parameters from JNI values into C# types, forwarding the method invocation to Activity.OnCreate(), and (if necessary) marshaling the return value back to JNI. Activity.GetOnCreate_Landroid_os_Bundle_Handler() is part of the type registration infrastructure, providing a Delegate instance to RegisterNativeMembers .RegisterNativeMembers(), which is eventually passed to JNIEnvRegisterNatives(). While this works, it’s not incredibly performant unless using one of the optimized delegate types added in xamarin-android#6657, System.Reflection.Emit is used to create a wrapper around the marshal method, which is something we’ve wanted to avoid doing for years. Thus, the idea since we’re already bundling a native toolchain and using LLVM-IR to produce libxamarin-app.so, what if we emitted Java native method names and skipped all the done as part of Runtime.register() and JNIEnv.RegisterJniNatives()? Given class MyActivity Activity { protected override void OnCreate(Bundle? state) => ... } During the build, libxamarin-app.so would contain the function JNIEXPORT void JNICALL Java_crc..._MyActivity_n_1onCreate (JNIEnv *env, jobject self, jobject state); During App runtime, the Runtime.register() invocation present in Java Callable Wrappers would either be omitted or would be a no-op, and Android/JNI would instead resolve MyActivity.n_onCreate() as Java_crc..._MyActivity_n_1onCreate(). We call this effort “LLVM Marshal Methods”, which is currently experimental in .NET 8. Many of the specifics are still being investigated, and this feature will be spread across various areas. See xamarin-android#7351 for details about this experimental feature. NativeAOT on iOS In .NET 7, we started an experiment to see what it would take to support NativeAOT on iOS. Going from prototype to an initial implementation .NET 8 Preview 6 included NativeAOT as an experimental feature for iOS. To opt into NativeAOT in a MAUI iOS project, use the following settings in your project file <PropertyGroup Condition="$([MSBuild]GetTargetPlatformIdentifier('$(TargetFramework)')) == 'ios' and '$(Configuration)' == 'Release'"> <!-- PublishAot=true indicates NativeAOT, while omitting this property would use Mono's AOT --> <PublishAot>true</PublishAot> </PropertyGroup> Then to build the application for an iOS device $ dotnet publish -f net8.0-ios -r ios-arm64 MSBuild version 17.8.0+6cdef4241 for .NET ... Build succeeded. 0 Error(s) Note We may consider unifying and improving MSBuild property names for this feature in future .NET releases. To do a one-off build at the command-line you may also need to specify -pPublishAotUsingRuntimePack=true in addition to -pPublishAot=true. One of the main culprits for the first release was how the iOS workload supports Objective-C interoperability. The problem was mainly related to the type registration system which is the key component for efficiently supporting iOS-like platforms (see docs for details). In its implementation, the type registration system depends on type metadata tokens which are not available with NativeAOT. Therefore, in order to leverage the benefits of highly efficient NativeAOT runtime, we had to adapt. dotnet/runtime#80912 includes the discussion around how to tackle this problem, and finally in xamarin-macios#18268 we implemented a new managed static registrar that works with NativeAOT. The new managed static registrar does not just benefit us with being compatible with NativeAOT, but is also much faster than the default one, and is available for all supported runtimes (see docs for details). Along the way, we had a great help from our GH community and their contribution (code reviews, PRs) was essential to helps us move forward quickly and deliver this feature on time. A few from many PR’s that helped and unblocked us on our journey were dotnet/runtime#77956 dotnet/runtime#78280 dotnet/runtime#82317 dotnet/runtime#85996 and the list goes on… As .NET 8 Preview 6 came along, we finally managed to release our first version of the NativeAOT on iOS which also supports MAUI. See the blog post on .NET 8 Preview 6 for details about what we were able to accomplish in the initial release. In subsequent .NET 8 releases, results improved quite a bit, as we were identifying and resolving issues along the way. The graph below shows the .NET MAUI iOS template app size comparison throughout the preview releases We had steady progress and estimated size savings reported, due to fixing the following issues dotnet/runtime#87924 – fixed major NativeAOT size issue with AOT-incompatible code paths in System.Linq.Expressions and also made fully NativeAOT compatible when targeting iOS xamarin-macios#18332 – reduced the size of __LINKEDIT Export Info section in stripped binaries Furthermore, in the latest RC 1 release the app size went even further down reaching -50% smaller apps for the template .NET MAUI iOS applications compared to Mono. Most impactful issues/PRs that contributed to this xamarin-macios#18734 – Make Full the default link mode for NativeAOT xamarin-macios#18584 – Make the codebase trimming compatible through a series of PRs. Even though app size was our primary metric to focus on, for the RC 1 release, we also measured startup time performance comparisons for a .NET MAUI iOS template app comparing NativeAOT and Mono where NativeAOT results with almost 2x faster startup time. Key Takeaways For NativeAOT scenarios on iOS, changing the default link mode to Full (xamarin-macios#18734) is probably the biggest improvement for application size. But at the same time, this change can also break applications which are not fully AOT and trim-compatible. In Full link mode, the trimmer might trim away AOT incompatible code paths (think about reflection usage) which are accessed dynamically at runtime. Full link mode is not the default configuration when using the Mono runtime, so it is possible that some applications are not fully AOT-compatible. Supporting NativeAOT on iOS is an experimental feature and still a work-in-progress, and our plan is to address the potential issues with Full link mode incrementally As a first step, we enabled trim, AOT, and single-file warnings by default in xamarin-macios#18571. The enabled warnings should make our customers aware at build-time, whether a use of a certain framework or a library, or some C# constructs in their code, is incompatible with NativeAOT – and could crash at runtime. This information should guide our customers to write AOT-compatible code, but also to help us improve our frameworks and libraries with the same goal of fully utilising the benefits of AOT compilation. The second step, was clearing up all the warnings coming from Microsoft.iOS and System.Private.CoreLib assemblies reported for a template iOS application with xamarin-macios#18629 and dotnet/runtime#91520. In future releases, we plan to address the warnings coming from the MAUI framework and further improve the overall user-experience. Our goal is to have fully AOT and trim-compatible frameworks. .NET 8 will support targeting iOS platforms with NativeAOT as an opt-in feature and shows great potential by generating up to 50% smaller and 50% faster startup compared to Mono. Considering the great performance that NativeAOT promises, please help us on this journey and try out your applications with NativeAOT and report any potential issues. At the same time, let us know when NativeAOT “just works” out-of-the-box. To follow future progress, see dotnet/runtime#80905. Last but not least, we would like to thank our GH contributors, who are helping us make NativeAOT on iOS possible. Build & Inner Loop Performance Filter Android ps -A output with grep When profiling the Android inner loop for a .NET MAUI project with PerfView we found around 1.2% of CPU time was spent just trying to get the process ID of the running Android application. When changing Tools > Options > Xamarin > Xamarin Diagnostics output verbosity to be Diagnostics, you could see -- Start GetProcessId - 12/02/2022 110557 (96.9929ms) -- [INPUT] ps -A [OUTPUT] USER PID PPID VSZ RSS WCHAN ADDR S NAME root 1 0 10943736 4288 0 0 S init root 2 0 0 0 0 0 S [kthreadd] ... Hundreds of more lines! u0_a993 14500 1340 14910808 250404 0 0 R com.companyname.mauiapp42 -- End GetProcessId -- The Xamarin/.NET MAUI extension in Visual Studio polls every second to see if the application has exited. This is useful for changing the play/stop button state if you force close the app, etc. Testing on a Pixel 5, we could see the command is actually 762 lines of output! > (adb shell ps -A).Count 762 What we could do instead is something like > adb shell "ps -A | grep -w -E 'PID|com.companyname.mauiapp42'" Where we pipe the output of ps -A to the grep command on the Android device. Yes, Android has a subset of unix commands available! We filter on either a line containing PID or your application’s package name. The result is now the IDE is only parsing 4 lines [INPUT] ps -A | grep -w -E 'PID|com.companyname.mauiapp42' [OUTPUT] USER PID PPID VSZ RSS WCHAN ADDR S NAME u0_a993 12856 1340 15020476 272724 0 0 S com.companyname.mauiapp42 This not only improves memory used to split and parse this information in C#, but adb is also transmitting way less bytes across your USB cable or virtually from an emulator. This feature shipped in recent versions of Visual Studio 2022, improving this scenario for all Xamarin and .NET MAUI customers. Port WindowsAppSDK usage of vcmeta.dll to C# We found that every incremental build of a .NET MAUI project running on Windows spent time in Top 10 most expensive tasks CompileXaml = 3.972 s ... various tasks ... This is the XAML compiler for WindowsAppSDK, that compiles the WinUI3 flavor of XAML (not .NET MAUI XAML). There is very little XAML of this type in .NET MAUI projects, in fact, the only file is Platforms/Windows/App.xaml in the project template. Interestingly, if you installed the Desktop development with C++ workload in the Visual Studio installer, this time just completely went away! Top 10 most expensive tasks ... various tasks ... CompileXaml = 9 ms The WindowsAppSDK XAML compiler p/invokes into a native library from the C++ workload, vcmeta.dll, to calculate a hash for .NET assembly files. This is used to make incremental builds fast — if the hash changes, compile the XAML again. If vcmeta.dll was not found on disk, the XAML compiler was effectively “recompiling everything” on every incremental build. For an initial fix, we simply included a small part of the C++ workload as a dependency of .NET MAUI in Visual Studio. The slightly larger install size was a good tradeoff for saving upwards of 4 seconds in incremental build time. Next, we implemented vcmeta.dll‘s hashing functionality in plain C# with System.Reflection.Metadata to compute indentical hash values as before. Not only was this implementation better, in that we could drop a dependency on the C++ workload, but it was also faster! The time to compute a single hash Method Mean Error StdDev Native 217.31 us 1.704 us 1.594 us Managed 86.43 us 1.700 us 2.210 us Some of the reasons this was faster No p/invoke or COM-interfaces involved. System.Reflection.Metadata has a fast struct-based API, perfect for iterating over types in a .NET assembly and computing a hash value. The end result being that CompileXaml might actually be even faster than 9ms in incremental builds. This feature shipped in WindowsAppSDK 1.3, which is now used by .NET MAUI in .NET 8. See WindowsAppSDK#3128 for details about this improvement. Improvements to remote iOS builds on Windows Comparing inner loop performance for iOS, there was a considerable gap between doing “remote iOS” development on Windows versus doing everything locally on macOS. Many small improvements were made, based on comparing inner-loop .binlog files recorded on macOS versus one recorded inside Visual Studio on Windows. Some examples include maui#12747 don’t explicitly copy files to the build server xamarin-macios#16752 do not copy files to build server for a Delete operation xamarin-macios#16929 batch file deletion via DeleteFilesAsync xamarin-macios#17033 cache AOT compiler path Xamarin/MAUI Visual Studio extension when running dotnet-install.sh on remote build hosts, set the explicit processor flag for M1 Macs. We also made some improvements for all iOS & MacCatalyst projects, such as xamarin-macios#16416 don’t process assemblies over and over again Improvements to Android inner-loop We also made many small improvements to the “inner-loop” on Android — most of which were focused in a specific area. Previously, Xamarin.Forms projects had the luxury of being organized into multiple projects, such as YourApp.Android.csproj Xamarin.Android application project YourApp.iOS.csproj Xamarin.iOS application project YourApp.csproj netstandard2.0 class library Where almost all of the logic for a Xamarin.Forms app was contained in the netstandard2.0 project. Nearly all the incremental builds would be changes to XAML or C# in the class library. This structure enabled the Xamarin.Android MSBuild targets to completely skip many Android-specific MSBuild steps. In .NET MAUI, the “single project” feature means that every incremental build has to run these Android-specific build steps. In focusing specifically improving this area, we made many small changes, such as java-interop#1061 avoid string.Format() java-interop#1064 improve ToJniNameFromAttributesForAndroid java-interop#1065 avoid File.Exists() checks java-interop#1069 fix more places to use TypeDefinitionCache java-interop#1072 use less System.Linq for custom attributes java-interop#1103 use MemoryMappedFile when using Mono.Cecil xamarin-android#7621 avoid File.Exists() checks xamarin-android#7626 perf improvements for LlvmIrGenerator xamarin-android#7652 fast path for <CheckClientHandlerType/> xamarin-android#7653 delay ToJniName when generating AndroidManifest.xml xamarin-android#7686 lazily populate Resource lookup These changes should improve incremental builds in all .NET 8 Android project types. XAML Compilation no longer uses LoadInSeparateAppDomain Looking at the JITStats report in PerfView (for MSBuild.exe) Name JitTime (ms) Microsoft.Maui.Controls.Build.Tasks.dll 214.0 Mono.Cecil 119.0 It appears that Microsoft.Maui.Controls.Build.Tasks.dll was spending a lot of time in the JIT. What was confusing, is this was an incremental build where everything should already be loaded. The JIT’s work should be done already? The cause appears to be usage of the [LoadInSeparateAppDomain] attribute defined by the <XamlCTask/> in .NET MAUI. This is an MSBuild feature that gives MSBuild tasks to run in an isolated AppDomain — with an obvious performance drawback. However, we couldn’t just remove it as there would be complications… [LoadInSeparateAppDomain] also conveniently resets all static state when <XamlCTask/> runs again. Meaning that future incremental builds would potentially use old (garbage) values. There are several places that cache Mono.Cecil objects for performance reasons. Really weird bugs would result if we didn’t address this. So, to actually make this change, we reworked all static state in the XAML compiler to be stored in instance fields & properties instead. This is a general software design improvement, in addition to giving us the ability to safely remove [LoadInSeparateAppDomain]. The results of this change, for an incremental build on a Windows PC Before XamlCTask = 743 ms XamlCTask = 706 ms XamlCTask = 692 ms After XamlCTask = 128 ms XamlCTask = 134 ms XamlCTask = 117 ms This saved about ~587ms on incremental builds on all platforms, an 82% improvement. This will help even more on large solutions with multiple .NET MAUI projects, where <XamlCTask/> runs multiple times. See maui#11982 for further details about this improvement. Performance or App Size Improvements Structs and IEquatable in .NET MAUI Using the Visual Studio’s .NET Object Allocation Tracking profiler on a customer .NET MAUI sample application, we saw Microsoft.Maui.WeakEventManager+Subscription Allocations 686,114 Bytes 21,955,648 This seemed like an exorbitant amount of memory to be used in a sample application’s startup! Drilling in to see where these struct‘s were being created System.Collections.Generic.ObjectEqualityComparer<Microsoft.Maui.WeakEventManager+Subscription>.IndexOf() The underlying problem was this struct didn’t implement IEquatable<T> and was being used as the key for a dictionary. The CA1815 code analysis rule was designed to catch this problem. This is not a rule that is enabled by default, so projects must opt into it. To solve this Subscription is internal to .NET MAUI, and its usage made it possible to be a readonly struct. This was just an extra improvement. We made CA1815 a build error across the entire dotnet/maui repository. We implemented IEquatable<T> for all struct types. After these changes, we could no longer found Microsoft.Maui.WeakEventManager+Subscription in memory snapshots at all. Which saved ~21 MB of allocations in this sample application. If your own projects have usage of struct, it seems quite worthwhile to make CA1815 a build error. A smaller, targeted version of this change was backported to MAUI in .NET 7. See maui#13232 for details about this improvement. Fix performance issue in {AppThemeBinding} Profiling a .NET MAUI sample application from a customer, we noticed a lot of time spent in {AppThemeBinding} and WeakEventManager while scrolling 2.08s (17%) microsoft.maui.controls!Microsoft.Maui.Controls.AppThemeBinding.Apply(object,Microsoft.Maui.Controls.BindableObject,Micr... 2.05s (16%) microsoft.maui.controls!Microsoft.Maui.Controls.AppThemeBinding.AttachEvents() 2.04s (16%) microsoft.maui!Microsoft.Maui.WeakEventManager.RemoveEventHandler(System.EventHandler`1<TEventArgs_REF>,string) The following was happening in this application The standard .NET MAUI project template has lots of {AppThemeBinding} in the default Styles.xaml. This supports Light vs Dark theming. {AppThemeBinding} subscribes to Application.RequestedThemeChanged So, every MAUI view subscribe to this event — potentially multiple times. Subscribers are a Dictionary<string, List<Subscriber>>, where there is a dictionary lookup followed by a O(N) search for unsubscribe operations. There is potentially a usecase here to come up with a generalized “weak event” pattern for .NET. The implementation currently in .NET MAUI came over from Xamarin.Forms, but a generalized pattern could be useful for .NET developers using other UI frameworks. To make this scenario fast, for now, in .NET 8 Before For any {AppThemeBinding}, it calls both RequestedThemeChanged -= OnRequestedThemeChanged O(N) time RequestedThemeChanged += OnRequestedThemeChanged constant time Where the -= is notably slower, due to possibly 100s of subscribers. After Create an _attached boolean, so we know know the “state” if it is attached or not. New bindings only call +=, where -= will now only be called by {AppThemeBinding} in rare cases. Most .NET MAUI apps do not “unapply” bindings, but -= would only be used in that case. See the full details about this fix in maui#14625. See dotnet/runtime#61517 for how we could implement “weak events” in .NET in the future. Address CA1307 and CA1309 for performance Profiling a .NET MAUI sample application from a customer, we noticed time spent during “culture-aware” string operations 77.22ms microsoft.maui!Microsoft.Maui.Graphics.MauiDrawable.SetDefaultBackgroundColor() 42.55ms System.Private.CoreLib!System.String.ToLower() This case, we can improve by simply calling ToLowerInvariant() instead. In some cases you might even consider using string.Equals() with StringComparer.Ordinal. In this case, our code was further reviewed and optimized in Reduce Java interop in MauiDrawable on Android. In .NET 7, we added CA1307 and CA1309 code analysis rules to catch cases like this, but it appears we missed some in Microsoft.Maui.Graphics.dll. These are likely useful rules to enable in your own .NET MAUI applications, as avoiding all culture-aware string operations can be quite impactful on mobile. See maui#14627 for details about this improvement. Address CA1311 for performance After addressing the CA1307 and CA1309 code analysis rules, we took things further and addressed CA1311. As mentioned in the turkish example, doing something like string text = something.ToUpper(); switch (text) { ... } Can actually cause unexpected behavior in Turkish locales, because in Turkish, the character I (Unicode 0049) is considered the upper case version of a different character ý (Unicode 0131), and i (Unicode 0069) is considered the lower case version of yet another character Ý (Unicode 0130). ToLowerInvariant() and ToUpperInvariant() are also better for performance as an invariant ToLower / ToUpper operation is slightly faster. Doing this also avoids loading the current culture, improving startup performance. There are cases where you would want the current culture, such as in a CaseConverter type in .NET MAUI. To do this, you simply have to be explicit in which culture you want to use return ConvertToUpper ? v.ToUpper(CultureInfo.CurrentCulture) v.ToLower(CultureInfo.CurrentCulture); The goal of this CaseConverter is to display upper or lowercase text to a user. So it makes sense to use the CurrentCulture for this. See maui#14773 for details about this improvement. Remove unused ViewAttachedToWindow event on Android Every Label in .NET MAUI was subscribing to public class MauiTextView AppCompatTextView { public MauiTextView(Context context) base(context) { this.ViewAttachedToWindow += MauiTextView_ViewAttachedToWindow; } private void MauiTextView_ViewAttachedToWindow(object? sender, ViewAttachedToWindowEventArgs e) { } //... This was leftover from refactoring, but appeared in dotnet-trace output as 278.55ms (2.4%) mono.android!Android.Views.View.add_ViewAttachedToWindow(System.EventHandler`1<Android.Views.View/ViewAttachedToWindowEv 30.55ms (0.26%) mono.android!Android.Views.View.IOnAttachStateChangeListenerInvoker.n_OnViewAttachedToWindow_Landroid_view_View__mm_wra Where the first is the subscription, and the second is the event firing from Java to C# — only to run an empty managed method. Simply removing this event subscription and empty method, resulted in only a few controls to subscribe to this event as needed 2.76ms (0.02%) mono.android!Android.Views.View.add_ViewAttachedToWindow(System.EventHandler`1<Android.Views.View/ViewAttachedToWindowEv See maui#14833 for details about this improvement. Remove unneeded System.Reflection for {Binding} All bindings in .NET MAUI commonly hit the code path if (property.CanWrite && property.SetMethod.IsPublic && !property.SetMethod.IsStatic) { part.LastSetter = property.SetMethod; var lastSetterParameters = part.LastSetter.GetParameters(); part.SetterType = lastSetterParameters[lastSetterParameters.Length - 1].ParameterType; //... Where ~53% of the time spent applying a binding appeared in dotnet-trace in the MethodInfo.GetParameters() method core.benchmarks!Microsoft.Maui.Benchmarks.BindingBenchmarker.BindName() ... microsoft.maui.controls!Microsoft.Maui.Controls.BindingExpression.SetupPart() System.Private.CoreLib.il!System.Reflection.RuntimeMethodInfo.GetParameters() The above C# is simply finding the property type. It is using a roundabout way of using the property setter’s first parameter, which can be simplified to part.SetterType = property.PropertyType; We could see the results of this change in a BenchmarkDotNet benchmark Method Mean Error StdDev Gen0 Gen1 Allocated –BindName 18.82 us 0.336 us 0.471 us 1.2817 1.2512 10.55 KB ++BindName 18.80 us 0.371 us 0.555 us 1.2512 1.2207 10.23 KB –BindChild 27.47 us 0.542 us 0.827 us 2.0142 1.9836 16.56 KB ++BindChild 26.71 us 0.516 us 0.652 us 1.9226 1.8921 15.94 KB –BindChildIndexer 58.39 us 1.113 us 1.143 us 3.1738 3.1128 26.17 KB ++BindChildIndexer 58.00 us 1.055 us 1.295 us 3.1128 3.0518 25.47 KB Where ++ denotes the new changes. See maui#14830 for further details about this improvement. Use StringComparer.Ordinal for Dictionary and HashSet Profiling a .NET MAUI sample application from a customer, we noticed 4% of the time while scrolling was spent doing dictionary lookups (4.0%) System.Private.CoreLib!System.Collections.Generic.Dictionary<TKey_REF,TValue_REF>.FindValue(TKey_REF) Observing the call stack, some of these were coming from culture-aware string lookups in .NET MAUI microsoft.maui!Microsoft.Maui.PropertyMapper.GetProperty(string) microsoft.maui!Microsoft.Maui.WeakEventManager.AddEventHandler(System.EventHandler<TEventArgs_REF>,string) microsoft.maui!Microsoft.Maui.CommandMapper.GetCommand(string) Which show up in dotnet-trace as a mixture of string comparers (0.98%) System.Private.CoreLib!System.Collections.Generic.NonRandomizedStringEqualityComparer.OrdinalComparer.GetHashCode(string) (0.71%) System.Private.CoreLib!System.String.GetNonRandomizedHashCode() (0.31%) System.Private.CoreLib!System.Collections.Generic.NonRandomizedStringEqualityComparer.OrdinalComparer.Equals(string,stri (0.01%) System.Private.CoreLib!System.Collections.Generic.NonRandomizedStringEqualityComparer.GetStringComparer(object) In cases of Dictionary<string, TValue> or HashSet<string>, we can use StringComparer.Ordinal in many cases to get faster dictionary lookups. This should slightly improve the performance of handlers & all .NET MAUI controls on all platforms. See maui#14900 for details about this improvement. Reduce Java interop in MauiDrawable on Android Profiling a .NET MAUI customer sample while scrolling on a Pixel 5, we saw some interesting time being spent in (0.76%) microsoft.maui!Microsoft.Maui.Graphics.MauiDrawable.OnDraw(Android.Graphics.Drawables.Shapes.Shape,Android.Graphics.Canv (0.54%) microsoft.maui!Microsoft.Maui.Graphics.MauiDrawable.SetDefaultBackgroundColor() This sample has a <Border/> inside a <CollectionView/> and so you can see this work happening while scrolling. Specifically, we reviewed code in .NET MAUI, such as _borderPaint.StrokeWidth = _strokeThickness; _borderPaint.StrokeJoin = _strokeLineJoin; _borderPaint.StrokeCap = _strokeLineCap; _borderPaint.StrokeMiter = _strokeMiterLimit * 2; if (_borderPathEffect != null) _borderPaint.SetPathEffect(_borderPathEffect); This calls from C# to Java five times. Creating a new method in PlatformInterop.java allowed us to reduce it to a single time. We also improved the following method, which would perform many calls from C# to Java // C# void SetDefaultBackgroundColor() { using (var background = new TypedValue()) { if (_context == null || _context.Theme == null || _context.Resources == null) return; if (_context.Theme.ResolveAttribute(globalAndroid.Resource.Attribute.WindowBackground, background, true)) { var resource = _context.Resources.GetResourceTypeName(background.ResourceId); var type = resource?.ToLowerInvariant(); if (type == "color") { var color = new Android.Graphics.Color(ContextCompat.GetColor(_context, background.ResourceId)); _backgroundColor = color; } } } } To be more succinctly implemented in Java as // Java /** * Gets the value of android.R.attr.windowBackground from the given Context * @param context * @return the color or -1 if not found */ public static int getWindowBackgroundColor(Context context) { TypedValue value = new TypedValue(); if (!context.getTheme().resolveAttribute(android.R.attr.windowBackground, value, true) && isColorType(value)) { return value.data; } else { return -1; } } /** * Needed because TypedValue.isColorType() is only API Q+ * https//github.com/aosp-mirror/platform_frameworks_base/blob/1d896eeeb8744a1498128d62c09a3aa0a2a29a16/core/java/android/util/TypedValue.java#L266-L268 * @param value * @return true if the TypedValue is a Color */ private static boolean isColorType(TypedValue value) { if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.Q) { return value.isColorType(); } else { // Implementation from AOSP return (value.type >= TypedValue.TYPE_FIRST_COLOR_INT && value.type <= TypedValue.TYPE_LAST_COLOR_INT); } } Which reduces our new implementation on the C# side to be a single Java call and creation of an Android.Graphics.Color struct void SetDefaultBackgroundColor() { var color = PlatformInterop.GetWindowBackgroundColor(_context); if (color != -1) { _backgroundColor = new Android.Graphics.Color(color); } } After these changes, we instead saw dotnet-trace output, such as (0.28%) microsoft.maui!Microsoft.Maui.Graphics.MauiDrawable.OnDraw(Android.Graphics.Drawables.Shapes.Shape,Android.Graphics.Canv (0.04%) microsoft.maui!Microsoft.Maui.Graphics.MauiDrawable.SetDefaultBackgroundColor() This improves the performance of any <Border/> (and other shapes) on Android, and drops about ~1% of the CPU usage while scrolling in this example. See maui#14933 for further details about this improvement. Improve layout performance of Label on Android Testing various .NET MAUI sample applications on Android, we noticed around 5.1% of time spent in PrepareForTextViewArrange() 1.01s (5.1%) microsoft.maui!Microsoft.Maui.ViewHandlerExtensions.PrepareForTextViewArrange(Microsoft.Maui.IViewHandler,Microsoft.Maui 635.99ms (3.2%) mono.android!Android.Views.View.get_Context() Most of the time is spent just calling Android.Views.View.Context to be able to then call into the extension method internal static int MakeMeasureSpecExact(this Context context, double size) { // Convert to a native size to create the spec for measuring var deviceSize = (int)context!.ToPixels(size); return MeasureSpecMode.Exactly.MakeMeasureSpec(deviceSize); } Calling the Context property can be expensive due the interop from C# to Java. Java returns a handle to the instance, then we have to look up any existing, managed C# objects for the Context. If all this work can simply be avoided, it can improve performance dramatically. In .NET 7, we made overloads to ToPixels() that allows you to get the same value with an Android.Views.View So we can instead do internal static int MakeMeasureSpecExact(this PlatformView view, double size) { // Convert to a native size to create the spec for measuring var deviceSize = (int)view.ToPixels(size); return MeasureSpecMode.Exactly.MakeMeasureSpec(deviceSize); } Not only did this change show improvements in dotnet-trace output, but we saw a noticeable difference in our LOLs per second test application from last year See maui#14980 for details about this improvement. Reduce Java interop calls for controls in .NET MAUI Reviewing the beautiful .NET MAUI “Surfing App” sample by @jsuarezruiz We noticed that a lot of time is spent doing Java interop while scrolling 1.76s (35%) Microsoft.Maui!Microsoft.Maui.Platform.WrapperView.DispatchDraw(Android.Graphics.Canvas) 1.76s (35%) Microsoft.Maui!Microsoft.Maui.Platform.ContentViewGroup.DispatchDraw(Android.Graphics.Canvas) These methods were deeply nested doing interop from Java -> C# -> Java many levels deep. In this case, moving some code from C# to Java could make it where less interop would occur; and in some cases no interop at all! So for example, previously DispatchDraw() was overridden in C# to implement clipping behavior // C# // ContentViewGroup is used internally by many .NET MAUI Controls class ContentViewGroup Android.Views.ViewGroup { protected override void DispatchDraw(Canvas? canvas) { if (Clip != null) ClipChild(canvas); base.DispatchDraw(canvas); } } By creating a PlatformContentViewGroup.java, we can do something like // Java /** * Set by C#, determining if we need to call getClipPath() * @param hasClip */ protected final void setHasClip(boolean hasClip) { this.hasClip = hasClip; postInvalidate(); } @Override protected void dispatchDraw(Canvas canvas) { // Only call into C# if there is a Clip if (hasClip) { Path path = getClipPath(canvas.getWidth(), canvas.getHeight()); if (path != null) { canvas.clipPath(path); } } super.dispatchDraw(canvas); } setHasClip() is called when clipping is enabled/disabled on any .NET MAUI control. This allowed the common path to not interop into C# at all, and only views that have opted into clipping would need to. This is very good because dispatchDraw() is called quite often during Android layout, scrolling, etc. This same treatment was also done to a few other internal .NET MAUI types like WrapperView improving the common case, making interop only occur when views have opted into clipping or drop shadows. For testing the impact of these changes, we used Google’s FrameMetricsAggregator that can be setup in any .NET MAUI application’s Platforms/Android/MainActivity.cs // How often in ms you'd like to print the statistics to the console const int Duration = 1000; FrameMetricsAggregator aggregator; Handler handler; protected override void OnCreate(Bundle savedInstanceState) { base.OnCreate(savedInstanceState); handler = new Handler(Looper.MainLooper); // We were interested in the "Total" time, other metrics also available aggregator = new FrameMetricsAggregator(FrameMetricsAggregator.TotalDuration); aggregator.Add(this); handler.PostDelayed(OnFrame, Duration); } void OnFrame() { // We were interested in the "Total" time, other metrics also available var metrics = aggregator.GetMetrics()[FrameMetricsAggregator.TotalIndex]; int size = metrics.Size(); double sum = 0, count = 0, slow = 0; for (int i = 0; i < size; i++) { int value = metrics.Get(i); if (value != 0) { count += value; sum += i * value; if (i > 16) slow += value; Console.WriteLine($"Frame(s) that took ~{i}ms, count {value}"); } } if (sum > 0) { Console.WriteLine($"Average frame time {sum / count0.00}ms"); Console.WriteLine($"No. of slow frames {slow}"); Console.WriteLine("-----"); } handler.PostDelayed(OnFrame, Duration); } FrameMetricsAggregator‘s API is admittedly a bit odd, but the data we get out is quite useful. The result is basically a lookup table where the key is a duration in milliseconds, and the value is the number of “frames” that took that duration. The idea is any frame that takes longer than 16ms is considered “slow” or “janky” as the Android docs sometimes refer. An example of the .NET MAUI “Surfing App” running on a Pixel 5 Before Frame(s) that took ~4ms, count 1 Frame(s) that took ~5ms, count 6 Frame(s) that took ~6ms, count 10 Frame(s) that took ~7ms, count 12 Frame(s) that took ~8ms, count 10 Frame(s) that took ~9ms, count 6 Frame(s) that took ~10ms, count 1 Frame(s) that took ~11ms, count 2 Frame(s) that took ~12ms, count 4 Frame(s) that took ~13ms, count 2 Frame(s) that took ~15ms, count 1 Frame(s) that took ~16ms, count 1 Frame(s) that took ~18ms, count 2 Frame(s) that took ~19ms, count 1 Frame(s) that took ~20ms, count 5 Frame(s) that took ~21ms, count 2 Frame(s) that took ~22ms, count 1 Frame(s) that took ~25ms, count 1 Frame(s) that took ~32ms, count 1 Frame(s) that took ~34ms, count 1 Frame(s) that took ~60ms, count 1 Frame(s) that took ~62ms, count 1 Frame(s) that took ~63ms, count 1 Frame(s) that took ~64ms, count 2 Frame(s) that took ~66ms, count 1 Frame(s) that took ~67ms, count 1 Frame(s) that took ~68ms, count 1 Frame(s) that took ~69ms, count 2 Frame(s) that took ~70ms, count 2 Frame(s) that took ~71ms, count 2 Frame(s) that took ~72ms, count 1 Frame(s) that took ~73ms, count 2 Frame(s) that took ~74ms, count 2 Frame(s) that took ~75ms, count 1 Frame(s) that took ~76ms, count 1 Frame(s) that took ~77ms, count 2 Frame(s) that took ~78ms, count 3 Frame(s) that took ~79ms, count 1 Frame(s) that took ~80ms, count 1 Frame(s) that took ~81ms, count 1 Average frame time 28.67ms No. of slow frames 43 After the changes to ContentViewGroup and WrapperView were in place, we got a very nice improvement! Even in an app making heavy usage of clipping and shadows After Frame(s) that took ~5ms, count 3 Frame(s) that took ~6ms, count 5 Frame(s) that took ~7ms, count 7 Frame(s) that took ~8ms, count 7 Frame(s) that took ~9ms, count 4 Frame(s) that took ~10ms, count 2 Frame(s) that took ~11ms, count 6 Frame(s) that took ~12ms, count 2 Frame(s) that took ~13ms, count 3 Frame(s) that took ~14ms, count 4 Frame(s) that took ~15ms, count 1 Frame(s) that took ~16ms, count 1 Frame(s) that took ~17ms, count 1 Frame(s) that took ~18ms, count 2 Frame(s) that took ~19ms, count 1 Frame(s) that took ~20ms, count 3 Frame(s) that took ~21ms, count 2 Frame(s) that took ~22ms, count 2 Frame(s) that took ~27ms, count 2 Frame(s) that took ~29ms, count 2 Frame(s) that took ~32ms, count 1 Frame(s) that took ~34ms, count 1 Frame(s) that took ~35ms, count 1 Frame(s) that took ~64ms, count 1 Frame(s) that took ~67ms, count 1 Frame(s) that took ~68ms, count 2 Frame(s) that took ~69ms, count 1 Frame(s) that took ~72ms, count 3 Frame(s) that took ~74ms, count 3 Average frame time 21.99ms No. of slow frames 29 See maui#14275 for further detail about these changes. Improve performance of Entry.MaxLength on Android Investigating a .NET MAUI customer sample Navigating from a Shell flyout. To a new page with several Entry controls. There was a noticeable performance delay. When profiling on a Pixel 5, one “hot path” was Entry.MaxLength 18.52ms (0.22%) microsoft.maui!Microsoft.Maui.Platform.EditTextExtensions.UpdateMaxLength(Android.Widget.EditText,Microsoft.Maui.IEntry) 16.03ms (0.19%) microsoft.maui!Microsoft.Maui.Platform.EditTextExtensions.UpdateMaxLength(Android.Widget.EditText,int) 12.16ms (0.14%) microsoft.maui!Microsoft.Maui.Platform.EditTextExtensions.SetLengthFilter(Android.Widget.EditText,int) EditTextExtensions.UpdateMaxLength() calls EditText.Text getter and setter EditTextExtensions.SetLengthFilter() calls EditText.Get/SetFilters() What happens is we end up marshaling strings and IInputFilter[] back and forth between C# and Java for every Entry control. All Entry controls go through this code path (even ones with a default value for MaxLength), so it made sense to move some of this code from C# to Java instead. Our C# code before // C# public static void UpdateMaxLength(this EditText editText, int maxLength) { editText.SetLengthFilter(maxLength); var newText = editText.Text.TrimToMaxLength(maxLength); if (editText.Text != newText) editText.Text = newText; } public static void SetLengthFilter(this EditText editText, int maxLength) { if (maxLength == -1) maxLength = int.MaxValue; var currentFilters = new List<IInputFilter>(editText.GetFilters() ?? new IInputFilter[0]); var changed = false; for (var i = 0; i < currentFilters.Count; i++) { if (currentFilters[i] is InputFilterLengthFilter) { currentFilters.RemoveAt(i); changed = true; break; } } if (maxLength >= 0) { currentFilters.Add(new InputFilterLengthFilter(maxLength)); changed = true; } if (changed) editText.SetFilters(currentFilters.ToArray()); } Moved to Java (with identical behavior) instead // Java /** * Sets the maxLength of an EditText * @param editText * @param maxLength */ public static void updateMaxLength(@NonNull EditText editText, int maxLength) { setLengthFilter(editText, maxLength); if (maxLength < 0) return; Editable currentText = editText.getText(); if (currentText.length() > maxLength) { editText.setText(currentText.subSequence(0, maxLength)); } } /** * Updates the InputFilter[] of an EditText. Used for Entry and SearchBar. * @param editText * @param maxLength */ public static void setLengthFilter(@NonNull EditText editText, int maxLength) { if (maxLength == -1) maxLength = Integer.MAX_VALUE; List<InputFilter> currentFilters = new ArrayList<>(Arrays.asList(editText.getFilters())); boolean changed = false; for (int i = 0; i < currentFilters.size(); i++) { InputFilter filter = currentFilters.get(i); if (filter instanceof InputFilter.LengthFilter) { currentFilters.remove(i); changed = true; break; } } if (maxLength >= 0) { currentFilters.add(new InputFilter.LengthFilter(maxLength)); changed = true; } if (changed) { InputFilter[] newFilter = new InputFilter[currentFilters.size()]; editText.setFilters(currentFilters.toArray(newFilter)); } } This avoids marshaling (copying!) string and array values back and forth from C# to Java. With these changes in place, the calls to EditTextExtensions.UpdateMaxLength() are now so fast they are missing completely from dotnet-trace output, saving ~19ms when navigating to the page in the customer sample. See maui#15614 for details about this improvement. Improve memory usage of CollectionView on Windows We reviewed a .NET MAUI customer sample with a CollectionView of 150,000 data-bound rows. Debugging what happens at runtime, .NET MAUI was effectively doing _itemTemplateContexts = new List<ItemTemplateContext>(capacity 150_000); for (int n = 0; n < 150_000; n++) { _itemTemplateContexts.Add(null); } And then each item is created as it is scrolled into view if (_itemTemplateContexts[index] == null) { _itemTemplateContexts[index] = context = new ItemTemplateContext(...); } return _itemTemplateContexts[index]; This wasn’t the best approach, but to improve things use a Dictionary<int, T> instead, just let it size dynamically. use TryGetValue(..., out var context), so each call accesses the indexer one less time than before. use either the bound collection’s size or 64 (whichever is smaller) as a rough estimate of how many might fit on screen at a time Our code changes to if (!_itemTemplateContexts.TryGetValue(index, out var context)) { _itemTemplateContexts[index] = context = new ItemTemplateContext(...); } return context; With these changes in place, a memory snapshot of the app after startup Before Heap Size 82,899.54 KB After Heap Size 81,768.76 KB Which is saving about 1MB of memory on launch. In this case, it feels better to just let the Dictionary size itself with an estimate of what capacity will be. See maui#16838 for details about this improvement. Use UnmanagedCallersOnlyAttribute on Apple platforms When unmanaged code calls into managed code, such as invoking a callback from Objective-C, the [MonoPInvokeCallbackAttribute] was previously used in Xamarin.iOS, Xamarin.Mac, and .NET 6+ for this purpose. The [UnmanagedCallersOnlyAttribute] attribute came along as a modern replacement for this Mono feature, which is implemented in a way with performance in mind. Unfortunately, there are a few restrictions when using this new attribute Method must be marked static. Must not be called from managed code. Must only have blittable arguments. Must not have generic type parameters or be contained within a generic class. Not only did we have to refactor the “code generator” that produces many of the bindings for Apple APIs for AppKit, UIKit, etc., but we also had many manual bindings that would need the same treatment. The end result is that most callbacks from Objective-C to C# should be faster in .NET 8 than before. See xamarin-macios#10470 and xamarin-macios#15783 for details about these improvements. Faster Java interop for strings on Android When binding members which have parameter types or return types which are java.lang.CharSequence, the member is “overloaded” to replace CharSequence with System.String, and the “original” member has a Formatted suffix. For example, consider android.widget.TextView, which has getText() and setText() methods which have parameter types and return types which are java.lang.CharSequence // Java class TextView extends View { public CharSequence getText(); public final void setText(CharSequence text); } When bound, this results in two properties // C# class TextView View { public Java.Lang.ICharSequence? TextFormatted { get; set; } public string? Text { get; set; } } The “non-Formatted overload” works by creating a temporary String object to invoke the Formatted overload, so the actual implementation looks like partial class TextView { public string? Text { get => TextFormatted?.ToString (); set { var jls = value == null ? null new Java.Lang.String (value); TextFormatted = jls; jls?.Dispose (); } } } TextView.Text is much easer to understand & simpler to consume for .NET developers than TextView.TextFormatted. A problem with the this approach is performance creating a new Java.Lang.String instance requires Creating the managed peer (the Java.Lang.String instance), Creating the native peer (the java.lang.String instance), And registering the mapping between (1) and (2) And then immediately use and dispose the value… This is particularly noticeable with .NET MAUI apps. Consider a customer sample, which uses XAML to set data-bound Text values in a CollectionView, which eventually hit TextView.Text. Profiling shows 653.69ms (6.3%) mono.android!Android.Widget.TextView.set_Text(string) 198.05ms (1.9%) mono.android!Java.Lang.String..ctor(string) 121.57ms (1.2%) mono.android!Java.Lang.Object.Dispose() 6.3% of scrolling time is spent in the TextView.Text property setter! Partially optimize this case if the *Formatted member is (1) a property, and (2) not virtual, then we can directly call the Java setter method. This avoids the need to create a managed peer and to register a mapping between the peers partial class TextView { public string? Text { get => TextFormatted?.ToString (); // unchanged set { const string __id = "setText.(Ljava/lang/CharSequence;)V"; JniObjectReference native_value = JniEnvironment.Strings.NewString (value); try { JniArgumentValue* __args = stackalloc JniArgumentValue [1]; __args [0] = new JniArgumentValue (native_value); _members.InstanceMethods.InvokeNonvirtualVoidMethod (__id, this, __args); } finally { JniObjectReference.Dispose (ref native_value); } } } } With the result being Method Mean Error StdDev Allocated Before SetFinalText 6.632 us 0.0101 us 0.0079 us 112 B After SetFinalText 1.361 us 0.0022 us 0.0019 us – The TextView.Text property setter invocation time is reduced to 20% of the previous average invocation time. Note that the virtual case is problematic for other reasons, but luckily enough TextView.setText() is non-virtual and likely one of the more commonly used Android APIs. See java-interop#1101 for details about this improvement. Faster Java interop for C# events on Android Profiling a .NET MAUI customer sample while scrolling on a Pixel 5, We saw ~2.2% of the time spent in the IOnFocusChangeListenerImplementor constructor, due to a subscription to the View.FocusChange event (2.2%) mono.android!Android.Views.View.IOnFocusChangeListenerImplementor..ctor() MAUI subscribes to Android.Views.View.FocusChange for every view placed on the screen, which happens while scrolling in this sample. Reviewing the generated code for the IOnFocusChangeListenerImplementor constructor, we see it still uses outdated JNIEnv APIs public IOnFocusChangeListenerImplementor () base ( Android.Runtime.JNIEnv.StartCreateInstance ("mono/android/view/View_OnFocusChangeListenerImplementor", "()V"), JniHandleOwnership.TransferLocalRef ) { Android.Runtime.JNIEnv.FinishCreateInstance (((Java.Lang.Object) this).Handle, "()V"); } Which we can change to use the newer/faster Java.Interop APIs public unsafe IOnFocusChangeListenerImplementor () base (IntPtr.Zero, JniHandleOwnership.DoNotTransfer) { const string __id = "()V"; if (((Java.Lang.Object) this).Handle != IntPtr.Zero) return; var h = JniPeerMembers.InstanceMethods.StartCreateInstance (__id, ((object) this).GetType (), null); SetHandle (h.Handle, JniHandleOwnership.TransferLocalRef); JniPeerMembers.InstanceMethods.FinishCreateInstance (__id, this, null); } These are better because the equivalent call to JNIEnv.FindClass() is cached, among other things. This was just one of the cases that was accidentally missed when we implemented the new Java.Interop APIs in the Xamarin timeframe. We simply needed to update our code generator to emit a better C# binding for this case. After these changes, we saw instead results in dotnet-trace (0.81%) mono.android!Android.Views.View.IOnFocusChangeListenerImplementor..ctor() This should improve the performance of all C# events that wrap Java listeners, a design-pattern commonly used in Java and Android applications. This includes the FocusedChanged event used by all .NET MAUI views on Android. See java-interop#1105 for details about this improvement. Use Function Pointers for JNI There is various machinery and generated code that makes Java interop possible from C#. Take, for example, the following instance method foo() in Java // Java object foo(object bar) { // returns some value } A C# method named CallObjectMethod is responsible for calling Java’s Native Interface (JNI) that calls into the JVM to actually invoke the Java method public static unsafe JniObjectReference CallObjectMethod (JniObjectReference instance, JniMethodInfo method, JniArgumentValue* args) { //... IntPtr thrown; var tmp = NativeMethods.java_interop_jnienv_call_object_method_a (JniEnvironment.EnvironmentPointer, out thrown, instance.Handle, method.ID, (IntPtr) args); Exception __e = JniEnvironment.GetExceptionForLastThrowable (thrown); if (__e != null) ExceptionDispatchInfo.Capture (__e).Throw (); JniEnvironment.LogCreateLocalRef (tmp); return new JniObjectReference (tmp, JniObjectReferenceType.Local); } In Xamarin.Android, .NET 6, and .NET 7 all calls into Java went through a java_interop_jnienv_call_object_method_a p/invoke, which signature looks like [DllImport (JavaInteropLib, CallingConvention = CallingConvention.Cdecl, CharSet = CharSet.Ansi)] internal static extern unsafe jobject java_interop_jnienv_call_object_method_a (IntPtr jnienv, out IntPtr thrown, jobject instance, IntPtr method, IntPtr args); Which is implemented in C as JI_API jobject java_interop_jnienv_call_object_method_a (JNIEnv *env, jthrowable *_thrown, jobject instance, jmethodID method, jvalue* args) { *_thrown = 0; jobject _r_ = (*env)->CallObjectMethodA (env, instance, method, args); *_thrown = (*env)->ExceptionOccurred (env); return _r_; } C# 9 introduced function pointers that allowed us a way to simplify things slightly — and make them faster as a result. So instead of using p/invoke in .NET 8, we could instead call a new unsafe method named CallObjectMethodA // Before var tmp = NativeMethods.java_interop_jnienv_call_object_method_a (JniEnvironment.EnvironmentPointer, out thrown, instance.Handle, method.ID, (IntPtr) args); // After var tmp = JniNativeMethods.CallObjectMethodA (JniEnvironment.EnvironmentPointer, instance.Handle, method.ID, (IntPtr) args); Which calls a C# function pointer directly [System.Runtime.CompilerServices.MethodImpl (System.Runtime.CompilerServices.MethodImplOptions.AggressiveInlining)] internal static unsafe jobject CallObjectMethodA (IntPtr env, jobject instance, IntPtr method, IntPtr args) { return (*((JNIEnv**)env))->CallObjectMethodA (env, instance, method, args); } This function pointer declared using the new syntax introduced in C# 9 public delegate* unmanaged <IntPtr, jobject, IntPtr, IntPtr, jobject> CallObjectMethodA; Comparing the two implementations with a manual benchmark # JIPinvokeTiming timing 000001.6993644 # Average Invocation 0.00016993643999999998ms # JIFunctionPointersTiming timing 000001.6561349 # Average Invocation 0.00016561349ms With a Release build, the average invocation time for JIFunctionPointersTiming takes 97% of the time as JIPinvokeTiming, i.e. is 3% faster. Additionally, using C# 9 function pointers means we can get rid of all of the java_interop_jnienv_*() C functions, which shrinks libmonodroid.so by ~55KB for each architecture. See xamarin-android#8234 and java-interop#938 for details about this improvement. Removed Xamarin.AndroidX.Legacy.Support.V4 Reviewing .NET MAUI’s Android dependencies, we noticed a suspicious package Xamarin.AndroidX.Legacy.Support.V4 If you are familiar with the Android Support Libraries, these are a set of packages Google provides to “polyfill” APIs to past versions of Android. This gives them a way to bring new APIs to old OS versions, since the Android ecosystem (OEMs, etc.) are much slower to upgrade as compared to iOS, for example. This particular package, Legacy.Support.V4, is actually support for Android as far back as Android API 4! The minimum supported Android version in .NET is Android API 21, which was released in 2017. It turns out this dependency was brought over from Xamarin.Forms and was not actually needed. As expected from this change, lots of Java code was removed from .NET MAUI apps. So much, in fact, that .NET 8 MAUI applications are now under the multi-dex limit — all Dalvik bytecode can fix into a single classes.dex file. A detailed breakdown of the size changes using apkdiff > apkdiff -f com.companyname.maui_before-Signed.apk com.companyname.maui_after-Signed.apk Size difference in bytes ([*1] apk1 only, [*2] apk2 only) + 1,598,040 classes.dex - 6 META-INF/androidx.asynclayoutinflater_asynclayoutinflater.version *1 - 6 META-INF/androidx.legacy_legacy-support-core-ui.version *1 - 6 META-INF/androidx.legacy_legacy-support-v4.version *1 - 6 META-INF/androidx.media_media.version *1 - 455 assemblies/assemblies.blob - 564 res/layout/notification_media_action.xml *1 - 744 res/layout/notification_media_cancel_action.xml *1 - 1,292 res/layout/notification_template_media.xml *1 - 1,584 META-INF/BNDLTOOL.SF - 1,584 META-INF/MANIFEST.MF - 1,696 res/layout/notification_template_big_media.xml *1 - 1,824 res/layout/notification_template_big_media_narrow.xml *1 - 2,456 resources.arsc - 2,756 res/layout/notification_template_media_custom.xml *1 - 2,872 res/layout/notification_template_lines_media.xml *1 - 3,044 res/layout/notification_template_big_media_custom.xml *1 - 3,216 res/layout/notification_template_big_media_narrow_custom.xml *1 - 2,030,636 classes2.dex Summary - 24,111 Other entries -0.35% (of 6,880,759) - 432,596 Dalvik executables -3.46% (of 12,515,440) + 0 Shared libraries 0.00% (of 12,235,904) - 169,179 Package size difference -1.12% (of 15,123,185) See dotnet/maui#12232 for details about this improvement. Deduplication of generics on iOS and macOS In .NET 7, iOS applications experienced app size increases due to C# generics usage across multiple .NET assemblies. When the .NET 7 Mono AOT compiler encounters a generic instance that is not handled by generic sharing, it will emit code for the instance. If the same instance is encountered during AOT compilation in multiple assemblies, the code will be emitted multiple times, increasing code size. In .NET 8, new dedup-skip and dedup-include command-line options are passed to the Mono AOT compiler. A new aot-instances.dll assembly is created for sharing this information in one place throughout the application. The change was tested on MySingleView app and Monotouch tests in the xamarin/xamarin-macios codebase App Baseline size on disk .ipa (MB) Target size on disk .ipa (MB) Baseline size on disk .app (MB) Target size on disk .app (MB) Baseline build time (s) Target build time (s) .app diff (%) MySingleView Release iOS 5.4 5.4 29.2 15.2 29.2 16.8 47.9 MySingleView Release iOSSimulator-arm64 N/A N/A 469.5 341.8 468.0 330.0 27.2 Monotouch Release llvm iOS 49.0 38.8 209.6 157.4 115.0 130.0 24.9 See xamarin-macios#17766 for details about this improvement. Fix System.Linq.Expressions implementation on iOS-like platforms In .NET 7, codepaths in System.Linq.Expressions were controlled by various flags such as CanCompileToIL CanEmitObjectArrayDelegate CanCreateArbitraryDelegates These flags were controlling codepaths which are “AOT friendly” and those that are not. For desktop platforms, NativeAOT specifies the following configuration for AOT-compatible code <IlcArg Include="--featureSystem.Linq.Expressions.CanCompileToIL=false" /> <IlcArg Include="--featureSystem.Linq.Expressions.CanEmitObjectArrayDelegate=false" /> <IlcArg Include="--featureSystem.Linq.Expressions.CanCreateArbitraryDelegates=false" /> When it comes to iOS-like platforms, System.Linq.Expressions library was built with constant propagation enabled and control variables were removed. This further caused above-listed NativeAOT feature switches not to have any effect (fail to trim during app build), potentially causing the AOT compilation to follow unsupported code paths on these platforms. In .NET8, we have unified the build of System.Linq.Expressions.dll shipping the same assembly for all supported platforms and runtimes, and simplified these switches to respect IsDynamicCodeSupported so that the .NET trimmer can remove the appropriate IL in System.Linq.Expressions.dll at application build time. See dotnet/runtime#87924 and dotnet/runtime#89308 for details about this improvement. Set DynamicCodeSupport=false for iOS and Catalyst In .NET 8, the feature switch $(DynamicCodeSupport) is set to false for platforms Where it is not possible to publish without the AOT compiler. When interpreter is not enabled. Which boils down to applications running on iOS, tvOS, MacCatalyst, etc. DynamicCodeSupport=false enables the .NET trimmer to remove code paths depending on RuntimeFeature.IsDynamicCodeSupported such as this example in System.Linq.Expressions. Estimated size savings are dotnet new maui (ios) old SLE.dll new SLE.dll + DynamicCodeSupported=false diff (%) Size on disk (Mb) 40,53 38,78 -4,31% .pkg (Mb) 14,83 14,20 -4,21% When combined with the System.Linq.Expressions improvements on iOS-like platforms, this showed a nice overall improvement to application size See xamarin-macios#18555 for details about this improvement. Memory Leaks Memory Leaks and Quality Given that the major theme for .NET MAUI in .NET 8 is quality, memory-related issues became a focal point for this release. Some of the problems found existed even in the Xamarin.Forms codebase, so we are happy to work towards a framework that developers can rely on for their cross-platform .NET applications. For full details on the work completed in .NET 8, we’ve various PRs and Issues related to memory issues at Pull Requests Issues You can see that considerable progress was made in .NET 8 in this area. If we compare .NET 7 MAUI versus .NET 8 MAUI in a sample application running on Windows, displaying the results of GC.GetTotalMemory() on screen Then compare the sample application running on macOS, but with many more pages pushed onto the navigation stack See the sample code for this project on GitHub for further details. Diagnosing leaks in .NET MAUI The symptom of a memory leak in a .NET MAUI application, could be something like Navigate from the landing page to a sub page. Go back. Navigate to the sub page again. Repeat. Memory grows consistently until the OS closes the application due to lack of memory. In the case of Android, you may see log messages such as 07-07 185139.090 17079 17079 D Mono GC_MAJOR (user request) time 137.21ms, stw 140.60ms los size 10984K in use 3434K 07-07 185139.090 17079 17079 D Mono GC_MAJOR_SWEEP major size 116192K in use 108493K 07-07 185139.092 17079 17079 I monodroid-gc 46204 outstanding GREFs. Performing a full GC! In this example, a 116MB heap is quite large for a mobile application, as well as over 46,000 C# <-> Java wrapper objects! To truly determine if the sub page is leaking, we can make a couple modifications to a .NET MAUI application Add logging in a finalizer. For example ~MyPage() => Console.WriteLine("Finalizer for ~MyPage()"); While navigating through your app, you can find out if entire pages are living forever if the log message is never displayed. This is a common symptom of a leak, because any View holds .Parent.Parent.Parent, etc. all the way up to the Page object. Call GC.Collect() somewhere in the app, such as the sub page’s constructor public MyPage() { GC.Collect(); // For debugging purposes only, remove later InitializeComponent(); } This makes the GC more deterministic, in that we are forcing it to run more frequently. Each time we navigate to the sub page, we are more likely causing the old sub page’s to go away. If things are working properly, we should see the log message from the finalizer. Note GC.Collect() is for debugging purposes only. You should not need this in your app after investigation is complete, so be sure to remove it afterward. With these changes in place, test a Release build of your app. On iOS, Android, macOS, etc. you can watch console output of your app to determine what is actually happening at runtime. adb logcat, for example, is a way to view these logs on Android. If running on Windows, you can also use Debug > Windows > Diagnostic Tools inside Visual Studio to take memory snapshots inside Visual Studio. In the future, we would like Visual Studio’s diagnostic tooling to support .NET MAUI applications running on other platforms. See our memory leaks wiki page for more information related to memory leaks in .NET MAUI applications. Patterns that cause leaks C# events C# events, just like a field, property, etc. can create strong references between objects. Let’s look at a situation where things can go wrong. Take for example, the cross-platform Grid.ColumnDefinitions property public class Grid Layout, IGridLayout { public static readonly BindableProperty ColumnDefinitionsProperty = BindableProperty.Create("ColumnDefinitions", typeof(ColumnDefinitionCollection), typeof(Grid), null, validateValue (bindable, value) => value != null, propertyChanged UpdateSizeChangedHandlers, defaultValueCreator bindable => { var colDef = new ColumnDefinitionCollection(); colDef.ItemSizeChanged += ((Grid)bindable).DefinitionsChanged; return colDef; }); public ColumnDefinitionCollection ColumnDefinitions { get { return (ColumnDefinitionCollection)GetValue(ColumnDefinitionsProperty); } set { SetValue(ColumnDefinitionsProperty, value); } } Grid has a strong reference to its ColumnDefinitionCollection via the BindableProperty. ColumnDefinitionCollection has a strong reference to Grid via the ItemSizeChanged event. If you put a breakpoint on the line with ItemSizeChanged +=, you can see the event has an EventHandler object where the Target is a strong reference back to the Grid. In some cases, circular references like this are completely OK. The .NET runtime(s)’ garbage collectors know how to collect cycles of objects that point each other. When there is no “root” object holding them both, they can both go away. The problem comes in with object lifetimes what happens if the ColumnDefinitionCollection lives for the life of the entire application? Consider the following Style in Application.Resources or Resources/Styles/Styles.xaml <Style TargetType="Grid" xKey="GridStyleWithColumnDefinitions"> <Setter Property="ColumnDefinitions" Value="18,*"/> </Style> If you applied this Style to a Grid on a random Page Application‘s main ResourceDictionary holds the Style. The Style holds a ColumnDefinitionCollection. The ColumnDefinitionCollection holds the Grid. Grid unfortunately holds the Page via .Parent.Parent.Parent, etc. This situation could cause entire Page‘s to live forever! Note The issue with Grid is fixed in maui#16145, but is an excellent example of illustrating how C# events can go wrong. Circular references on Apple platforms Even since the early days of Xamarin.iOS, there has existed an issue with “circular references” even in a garbage-collected runtime like .NET. C# objects co-exist with a reference-counted world on Apple platforms, and so a C# object that subclasses NSObject can run into situations where they can accidentally live forever — a memory leak. This is not a .NET-specific problem, as you can just as easily create the same situation in Objective-C or Swift. Note that this does not occur on Android or Windows platforms. Take for example, the following circular reference class MyViewSubclass UIView { public UIView? Parent { get; set; } public void Add(MyViewSubclass subview) { subview.Parent = this; AddSubview(subview); } } //... var parent = new MyViewSubclass(); var view = new MyViewSubclass(); parent.Add(view); In this case parent -> view via Subviews view -> parent via the Parent property The reference count of both objects is non-zero. Both objects live forever. This problem isn’t limited to a field or property, you can create similar situations with C# events class MyView UIView { public MyView() { var picker = new UIDatePicker(); AddSubview(picker); picker.ValueChanged += OnValueChanged; } void OnValueChanged(object? sender, EventArgs e) { } // Use this instead and it doesn't leak! //static void OnValueChanged(object? sender, EventArgs e) { } } In this case MyView -> UIDatePicker via Subviews UIDatePicker -> MyView via ValueChanged and EventHandler.Target Both objects live forever. A solution for this example, is to make OnValueChanged method static, which would result in a null Target on the EventHandler instance. Another solution, would be to put OnValueChanged in a non-NSObject subclass class MyView UIView { readonly Proxy _proxy = new(); public MyView() { var picker = new UIDatePicker(); AddSubview(picker); picker.ValueChanged += _proxy.OnValueChanged; } class Proxy { public void OnValueChanged(object? sender, EventArgs e) { } } } This is the pattern we’ve used in most .NET MAUI handlers and other UIView subclasses. See the MemoryLeaksOniOS sample repo, if you would like to play with some of these scenarios in isolation in an iOS application without .NET MAUI. Roslyn analyzer for Apple platforms We also have an experimental Roslyn Analyzer that can detect these situations at build time. To add it to net7.0-ios, net8.0-ios, etc. projects, you can simply install a NuGet package <PackageReference Include="MemoryAnalyzers" Version="0.1.0-beta.3" PrivateAssets="all" /> Some examples of a warning would be public class MyView UIView { public event EventHandler MyEvent; } Event 'MyEvent' could could memory leaks in an NSObject subclass. Remove the event or add the [UnconditionalSuppressMessage("Memory", "MA0001")] attribute with a justification as to why the event will not leak. Note that the analyzer can warns if there might be an issue, so it can be quite noisy to enable in a large, existing codebase. Inspecting memory at runtime is the best way to determine if there is truly a memory leak. Tooling and Documentation Simplified dotnet-trace and dotnet-dsrouter In .NET 7, profiling a mobile application was a bit of a challenge. You had to run dotnet-dsrouter and dotnet-trace together and get all the settings right to be able to retrieve a .nettrace or speedscope file for performance investigations. There was also no built-in support for dotnet-gcdump to connect to dotnet-dsrouter to get memory snapshots of a running .NET MAUI application. In .NET 8, we’ve streamlined this scenario by making new commands for dotnet-dsrouter that simplifies the workflow. To verify you have the latest diagnostic tooling, you can install them via $ dotnet tool install -g dotnet-dsrouter You can invoke the tool using the following command dotnet-dsrouter Tool 'dotnet-dsrouter' was successfully installed. $ dotnet tool install -g dotnet-gcdump You can invoke the tool using the following command dotnet-gcdump Tool 'dotnet-gcdump' was successfully installed. $ dotnet tool install -g dotnet-trace You can invoke the tool using the following command dotnet-trace Tool 'dotnet-trace' was successfully installed. Verify you have at least 8.x versions of these tools $ dotnet tool list -g Package Id Version Commands -------------------------------------------------------------------------------------- dotnet-dsrouter 8.0.452401 dotnet-dsrouter dotnet-gcdump 8.0.452401 dotnet-gcdump dotnet-trace 8.0.452401 dotnet-trace To profile an Android application on an Android emulator, first build and install your application in Release mode such as $ dotnet build -f net8.0-android -tInstall -c Release -pAndroidEnableProfiler=true Build SUCCEEDED. 0 Warning(s) 0 Error(s) Next, open a terminal to run dotnet-dsrouter $ dotnet-dsrouter android-emu Start an application on android emulator with one of the following environment variables set DOTNET_DiagnosticPorts=10.0.2.29000,nosuspend,connect DOTNET_DiagnosticPorts=10.0.2.29000,suspend,connect Then in a second terminal window, we can set the debug.mono.profile Android system property, as the stand-in for $DOTNET_DiagnosticPorts $ adb shell setprop debug.mono.profile '10.0.2.29000,suspend,connect' $ dotnet-trace ps 3248 dotnet-dsrouter $ dotnet-trace collect -p 3248 --format speedscope ... [00000009] Recording trace 3.2522 (MB) Press <Enter> or <Ctrl+C> to exit... Note Android doesn’t have good support for environment variables like $DOTNET_DiagnosticPorts. You can create an AndroidEnvironment text file for setting environment variables, but Android system properties can be simpler as they would not require rebuilding the application to set them. Upon launching the Android application, it should be able to connect to dotnet-dsrouter -> dotnet-trace and record performance profiling information for investigation. The --format argument is optional and it defaults to .nettrace. However, .nettrace files can be viewed only with Perfview on Windows, while the speedscope JSON files can be viewed “on” macOS or Linux by uploading them to https//speedscope.app. Note When providing a process ID to dotnet-trace, it knows how to tell if a process ID is dotnet-dsrouter and connect through it appropriately. dotnet-dsrouter has the following new commands to simplify the workflow dotnet-dsrouter android Android devices dotnet-dsrouter android-emu Android emulators dotnet-dsrouter ios iOS devices dotnet-dsrouter ios-sim iOS simulators See the .NET MAUI wiki for more information about profiling .NET MAUI applications on each platform. dotnet-gcdump Support for Mobile In .NET 7, we had a somewhat complex method (see wiki) for getting a memory snapshot of an application on the Mono runtime (such as iOS or Android). You had to use a Mono-specific event provider such as dotnet-trace collect --diagnostic-port /tmp/maui-app --providers Microsoft-DotNETRuntimeMonoProfiler0xC9000014 And then we relied on Filip Navara’s mono-gcdump tool (thanks Filip!) to convert the .nettrace file to .gcdump to be opened in Visual Studio or PerfView. In .NET 8, we now have dotnet-gcdump support for mobile scenarios. If you want to get a memory snapshot of a running application, you can use dotnet-gcdump in a similar fashion as dotnet-trace $ dotnet-gcdump ps 3248 dotnet-dsrouter $ dotnet-gcdump collect -p 3248 Writing gcdump to '20231018_115631_29880.gcdump'... Note This requires the exact same setup as dotnet-trace, such as -pAndroidEnableProfiler=true, dotnet-dsrouter, adb commands, etc. This greatly streamlines our workflow for investigating memory leaks in .NET MAUI applications. See our memory leaks wiki page for more information. The post .NET 8 Performance Improvements in .NET MAUI appeared first on .NET Blog.


Asp.Net Core 3.1 Error  The 'inject' directive exp ...
Category: .Net 7

Problem Severity Code Description Project File Line Suppression StateErro ...


Views: 1828 Likes: 94
Computer Vision in PyTorch (Part 1): Building Your First CNN for Pneumonia Detection
Computer Vision in PyTorch (Part 1) Building Your ...

Have you ever wondered how computers can recognize faces in photos or detect obstacles for self-driving cars? This capability stems from computer vision, the field of deep learning focused on enabling machines to interpret and understand visual information from the world around them. But how can this technology tackle more complex challenges, like analyzing medical images to aid diagnoses? In this two-part tutorial, you'll explore exactly that by learning how to use Convolutional Neural Networks (CNNs), a powerful type of neural network designed specifically for image analysis. You'll build your first CNN in PyTorch to analyze real chest X-ray images and identify signs of pneumonia. Whether you're new to computer vision or looking to apply your deep learning skills to a real-world problem, this tutorial series will guide you step-by-step through building, training, and evaluating your own image classification model. By the time you complete this tutorial, you will not only build your initial model but also be able to Explain how CNNs automatically extract important features from images. Understand the purpose of core CNN components like convolutional and pooling layers. Recognize why object-oriented programming is frequently used by professional deep learning practitioners. Define and build your own custom CNN architecture in PyTorch. Understanding the Pneumonia Detection Dataset Before we start designing our CNN architecture, let's first understand the dataset we'll be working with. This understanding will inform our design choices as we build out our model. We'll be working with a dataset of chest X-ray images labeled as either "NORMAL" or "PNEUMONIA." These medical images have specific characteristics we should keep in mind They're grayscale images (single-channel) rather than color (three-channel RGB) They contain subtle patterns that distinguish healthy lungs from those with pneumonia They show similar anatomical structures (lungs, heart, ribs) across patients, but with individual variations They have high resolution to capture fine details necessary for accurate diagnosis Here's what a NORMAL X-ray looks like (left) compared to a typical PNEUMONIA one (right) Notice how pneumonia appears as cloudy white areas in the lungs (which normally should be dark). These patterns are precisely what our CNN will learn to identify. Why CNNs Excel at Image Tasks If you've worked with traditional neural networks before, you might wonder why we need a specialized architecture for images. Why not just use a standard fully-connected network? If you were to try to train a traditional neural network on these X-ray images, you'd immediately face two major challenges Overwhelming parameter count A modest 256×256 grayscale X-ray contains 65,536 pixels. If we connected each pixel to just 1,000 neurons in the first hidden layer, we'd need over 65 million parameters for that layer alone! This would make the model Extremely slow to train Prone to severe overfitting Impractical for deployment in medical settings For perspective, the first convolutional layer in the CNN we will build in this tutorial achieves its initial feature extraction using only 320 parameters. Loss of critical spatial relationships When diagnosing pneumonia, the pattern and location of opacities in the lung matter tremendously. Traditional networks would immediately flatten images into 1D arrays, destroying the spatial information that doctors rely on. CNNs elegantly solve these problems through two ingenious design principles Local connectivity Rather than connecting to every pixel, each CNN neuron connects only to a small patch of the previous layer, much like how different parts of the visual cortex in our brains respond to specific regions of our visual field. This dramatically reduces parameters while preserving the ability to detect local patterns like the edges of lung structures. Parameter sharing The same set of filters (weights) is applied across the entire image. This makes intuitive sense since the feature that identifies pneumonia-related opacity should work regardless of whether it appears in the upper or lower lung. These design choices make CNNs particularly effective for analyzing medical images where accurately identifying spatial patterns can literally be a matter of life and death. Understanding CNN Components Now that we understand why CNNs are well-suited for image analysis, let's learn about the building blocks that make them work. These components will form the foundation of our pneumonia detection model. Convolutional Layers Are The Feature Extractors The heart of any CNN is the convolutional layer. Unlike standard fully-connected layers that look at all input values globally, convolutional layers work more like a magnifying glass scanning across the image. They use a small sliding window to examine sections of the input image one patch at a time. This approach allows them to effectively detect specific local patterns, like edges, corners, or simple textures, regardless of where those patterns appear in the overall image. This ability to recognize patterns independent of their location is fundamental to how CNNs process visual information. Now, let's look at how this sliding window operates. In the animation above, you can see the core process the small sliding window, technically called a kernel (the grid of weights, shown in white), moves (or convolves) across the input (green grid). At each position, it performs an element-wise multiplication between the kernel's weights and the underlying input values, and then sums the results to produce a single output value. This value becomes part of the output feature map (blue grid), which highlights where the pattern detected by the kernel was found. Interestingly, the kernel's weights aren't fixed; they are learnable parameters, automatically adjusted during training via backpropagation to become effective at detecting relevant patterns. For our pneumonia detection task, the filters in early convolutional layers might learn to detect simple features like edges (e.g., rib and organ boundaries) or basic textures. Filters in deeper layers can then combine these simpler features to recognize more complex patterns relevant to pneumonia, such as the characteristic cloudy opacities within the lungs. When defining a convolutional layer, you'll typically configure these key hyperparameters Kernel Size This defines the dimensions (height and width) of the kernel?the sliding window of weights. Common sizes are 3×3 or 5×5. Smaller kernels generally capture more localized, finer details, while larger kernels can identify broader, more spread-out patterns. Number of Filters This specifies how many different pattern detectors the layer will have. Each filter acts as a unique feature detector and consists of its own learnable kernel (weights) plus a single learnable bias term. So, conceptually filter = kernel + bias. The bias is a value added to the result of the convolution calculation (the sum of element-wise products) at each position. This learnable parameter allows the filter to adjust its output threshold independently of the weighted sum of inputs, increasing the model's flexibility to learn patterns. Applying one filter across the input produces one 2D feature map in the output. Therefore, the number of filters you specify directly determines the number of output channels (the depth) of the layer's output volume. More filters allow the network to learn a richer set of features simultaneously, but also increase the number of parameters and computational load. Stride This controls how many pixels the kernel slides across the input at each step. A stride of 1 (as in the animation above) means it moves one pixel at a time. A larger stride (like 2, as shown in the animation below) causes the kernel to skip pixels, resulting in a smaller output feature map (dimensionally) and potentially faster computation, but with less spatial detail captured. Padding This parameter controls whether pixels are added around the border of the input before the convolution operation. The two main strategies are No Padding (sometimes called 'valid' padding) In this mode, the kernel only slides over positions where it fully overlaps the input data. This causes the output feature map's height and width to shrink relative to the input dimensions (unless the kernel size is 1×1). The convolution is only computed for 'valid' positions where the kernel fits entirely. Zero Padding Pixels with a value of zero are added symmetrically around the input's border. This technique gives you control over the output dimensions. A common goal is to calculate the right amount of padding (based on kernel size) to achieve 'same' padding, where the output feature map has the same height and width as the input map (this is typically used when the stride is 1). Using 'same' padding helps preserve information throughout the network, especially features located near the edges of the input, which can be valuable when analyzing medical images where abnormalities might appear anywhere. Input and Output Shapes (Channels) Convolutional layers operate on input data arranged as 3D volumes with dimensions (height × width × input channels). They also produce output feature maps arranged as a 3D volume (output height × output width × output channels). The number of output channels is set by the Number of Filters hyperparameter you choose for the layer, as we discussed; each filter produces one channel (feature map) in the output. The number of input channels for a layer isn't typically a hyperparameter you tune; instead, it must match the number of channels in the data coming into that layer. For the very first convolutional layer that processes the raw image, this depends on the image type Grayscale images (like our X-rays) These have only one channel (input_channels=1). Why? Because each pixel's value represents only a single piece of information its intensity or brightness (from black to white). Color images These typically have three channels (input_channels=3). Why? Because they represent the intensity of three primary colors Red, Green, and Blue (RGB), which are needed to create the full color spectrum at each pixel position. For any subsequent convolutional layer deeper in the network, its input_channels must be equal to the output_channels (the number of filters) of the layer immediately preceding it, ensuring the dimensions match up correctly. The output feature map's height and width will depend on the input dimensions combined with the layer's kernel size, stride, and padding settings. Pooling Layers Focusing on What Matters After applying convolutions and detecting features, pooling layers help the network Reduce the spatial dimensions of feature maps Focus on the most important information Gain some resistance to small translations or shifts in the image The animation demonstrates max pooling, which divides the input into regions and takes only the maximum value from each. For pneumonia detection, this helps the network focus on the strongest indicators of disease while ignoring less relevant details. Max pooling creates a form of translation invariance because the network cares more about whether a feature is present than its exact location. This is useful for our task since pneumonia patterns can appear in slightly different locations across patients. Batch Normalization Stabilizing Training Medical image datasets like our pneumonia X-rays can have high variability in pixel intensity and contrast. Batch normalization helps stabilize the learning process by standardizing the inputs to each layer. By normalizing each batch of data during training, batch normalization Enables faster and more stable training Makes the model less sensitive to poor weight initialization Adds a mild regularization effect Allows for higher learning rates without divergence When building deep CNNs for medical imaging, batch normalization can be particularly valuable for handling the variability across different X-ray machines and imaging protocols. These components are often grouped together in repeating blocks within modern CNNs. A frequently used and effective structure for such a block is Convolutional Layer Batch Normalization Layer Activation Function (e.g., ReLU) Pooling Layer (optional, depending on the specific architecture) Dropout Layers Preventing Overfitting Medical imaging datasets like chest X-rays often contain far fewer examples than large-scale datasets like ImageNet. That makes it easier for a model to memorize the training data instead of learning patterns that generalize to new patients. To combat this, we’ll use dropout—a regularization technique that reduces overfitting by randomly disabling neurons during training. In the animated example below, you can see how a dropout layer with a 0.5 probability temporarily disables two out of four nodes on each forward pass. Notice how it’s not always the same two—it changes every time, forcing the network to build redundant pathways. In our pneumonia classifier, we’ll apply dropout usually within the fully connected layers near the end of the network. This helps ensure that the final classification doesn’t rely too heavily on any single feature learned earlier, helping the model generalize better to new chest X-rays. From Components to Architecture Now that we understand the individual CNN components, let's consider how to assemble them into a complete model architecture for our pneumonia detection task. Before designing the specific architecture (what we'll build), it's helpful to discuss the standard programming approach used to define such models in PyTorch (how we'll build it). Why Object-Oriented Models Are the Standard PyTorch offers multiple ways to define neural networks, but the object-oriented programming (OOP) approach using the nn.Module class is widely recognized as the standard for professional development. Let's explore why this approach is so beneficial, both for our current project and for your future computer vision work. When you look at how complex deep learning models are built in practice, whether for image recognition, autonomous navigation, natural language processing, or scientific discovery, you'll find they’re typically defined using object-oriented principles. This approach offers several key advantages Modularity OOP allows us to define reusable building blocks (like custom convolutional blocks or specific layer sequences) that can be easily stacked, swapped, and reconfigured. This is valuable when experimenting with different architectural ideas for any computer vision task, including optimizing models for medical image analysis. Maintainability Real-world models often need to evolve as new research emerges or project requirements change. The clear structure provided by OOP makes models easier to understand, debug, update, and collaborate on, whether you're incorporating a new state-of-the-art technique or adapting your model for a different dataset. Flexibility Many computer vision tasks benefit from custom operations or network structures that go beyond simple sequential layer stacking. OOP readily supports building complex, non-sequential architectures or integrating custom components, which can be cumbersome with simpler definition methods. Scalability As projects grow in complexity (e.g., tackling more intricate tasks, using larger datasets, or integrating different types of data), the organized nature of OOP makes managing this increased scale much more feasible than flatter script-based approaches. Industry alignment Across diverse fields applying deep learning, from tech companies and research institutions to finance and healthcare, this object-oriented approach using classes like nn.Module is the common standard for professional development. Simply put, learning to define your models using an object-oriented approach (by subclassing nn.Module) is ideal for building powerful, adaptable, and reusable computer vision systems. Of course, for very simple sequential models or quick proof-of-concept tests, more direct methods like using nn.Sequential can be perfectly effective and faster to write. However, the OOP structure truly shines when it comes to managing complexity, promoting code maintainability, and enabling the flexibility needed for larger or evolving real-world applications, making it the standard professional approach. Understanding this method prepares you to take on challenging and worthwhile projects, from analyzing medical images like we are here, to developing advanced systems in countless other fields. Defining Your CNN in PyTorch Now let's implement our pneumonia detection CNN using PyTorch's object-oriented style. We'll build a model that can effectively analyze chest X-rays and distinguish between normal and pneumonia cases. First, let's make sure we have all the required dependencies to build the model import torch import torch.nn as nn import torch.nn.functional as F Next, we'll define our CNN by subclassing nn.Module, PyTorch's base class for all neural networks class PneumoniaCNN(nn.Module) def __init__(self) super().__init__() # First convolutional block self.conv_block1 = nn.Sequential( nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, padding=1), nn.BatchNorm2d(num_features=32), nn.ReLU(), nn.MaxPool2d(kernel_size=2) # Reduce spatial dimensions by half; see explanation below ) # Second convolutional block self.conv_block2 = nn.Sequential( nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1), nn.BatchNorm2d(num_features=64), nn.ReLU(), nn.MaxPool2d(kernel_size=2) # Further reduce spatial dimensions; see explanation below ) # Third convolutional block self.conv_block3 = nn.Sequential( nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, padding=1), nn.BatchNorm2d(num_features=128), nn.ReLU(), nn.MaxPool2d(kernel_size=2) # Further reduce spatial dimensions; see explanation below ) # Flatten layer to convert 3D feature maps to 1D vector self.flatten = nn.Flatten() # Fully connected layers for classification self.fc1 = nn.Linear(in_features=128 * 32 * 32, out_features=512) # Adjust size based on input dimensions self.dropout1 = nn.Dropout(0.5) # Add 50% dropout for regularization self.fc2 = nn.Linear(in_features=512, out_features=128) self.dropout2 = nn.Dropout(0.5) self.fc3 = nn.Linear(in_features=128, out_features=2) # 2 output classes Normal and Pneumonia def forward(self, x) # Pass input through convolutional blocks x = self.conv_block1(x) x = self.conv_block2(x) x = self.conv_block3(x) # Flatten the features x = self.flatten(x) # Pass through fully connected layers x = F.relu(self.fc1(x)) x = self.dropout1(x) x = F.relu(self.fc2(x)) x = self.dropout2(x) logits = self.fc3(x) # Raw, unnormalized predictions return logits Let's break down what's happening in this model We start by creating three convolutional blocks using the nn.Sequential class. Each block contains nn.Conv2d() A convolutional layer that extracts features from images nn.BatchNorm2d() Batch normalization to stabilize training nn.ReLU() ReLU activation to introduce non-linearity nn.MaxPool2d() Max pooling to reduce spatial dimensions and focus on the most important features within local regions Notice we pass in_channels=1 to the first convolutional layer (conv_block1). This explicitly tells the layer to expect input data with a single channel, which is correct for our grayscale X-ray images where each pixel has only one intensity value. (Color images would typically use in_channels=3 for RGB). We gradually increase the number of filters (output channels) in subsequent blocks (32 ? 64 ? 128). This is a common CNN design pattern. Early layers with fewer filters tend to capture simpler, more general features (like edges or basic textures), while deeper layers with more filters can combine these simple features to learn more complex and abstract patterns specific to the task (like the visual characteristics of pneumonia). After the convolutional blocks, we flattenthe final 3D feature map (height × width × channels) into a 1D vector. This vector becomes the input to the first fully connected layer (self.fc1). To determine the required in_features for self.fc1, we need to know the shape of the feature map after the last pooling layer. We'll be resizing our input images to 256×256 pixels during data preparation (covered in the next tutorial). Given this 256×256 input size, let's trace how the dimensions change through the three max pooling layers, as each one halves the height and width Start 256×256 After 1st Pool layer 128×128 After 2nd Pool layer 64×64 After 3rd Pool layer 32×32 So, the feature map entering the flatten layer has spatial dimensions 32×32. Since the last convolutional block (conv_block3) outputs 128 channels (or feature maps), the total number of features in the flattened vector is 128×32×32=131,072. This is the value we need for in_features in self.fc1. The fully connected layers (nn.Linear), sometimes called dense layers, perform the final classification based on the extracted features. We intersperse nn.Dropout(0.5) layers between the fully connected layers. Dropout is a regularization technique that helps prevent overfitting, which is especially important when working with limited datasets. It randomly sets a fraction (here, 50%) of neuron outputs to zero during training, forcing the network to learn more robust representations. The final layer (self.fc3) outputs two values, corresponding to the scores for our two classes Normal and Pneumonia. Note that these outputs are raw scores, often called logits. We don't apply a final activation function like Softmax here because the standard PyTorch loss function for multi-class classification, nn.CrossEntropyLoss, conveniently expects raw logits as input (it applies the necessary transformations internally during training). The __init__ method defines all the network's layers and assigns them to instance attributes (like self.conv_block1, self.fc1, etc.). The forward method then defines the order in which input data x flows through these predefined layers to produce the final output. You might also notice we used the module nn.ReLU() inside the nn.Sequential blocks defined in __init__, but called the functional version F.relu() directly in the forward method after the first two fully connected layers. Both apply the exact same ReLU activation. nn.ReLU() is required within nn.Sequential because nn.Sequential expects nn.Module instances. Using F.relu() directly in forward is common and often slightly more concise for stateless operations like activation functions, as you don't need to define it in __init__ first. Both approaches are valid within the forward method itself. The .forward() method in our model defines how data flows through our network?it's the execution path that input takes as it's transformed into output predictions. When we later use our model with syntax like outputs = model(images), PyTorch automatically calls this .forward() method behind the scenes. This clean separation between model structure (defined in __init__()) and computation flow (defined in forward()) is one of the key benefits of PyTorch's object-oriented approach. Verifying Tensor Shapes When building CNNs, one of the most common sources of errors is mismatched tensor shapes between layers. For example, if the flattened output of your convolutional blocks doesn't produce the exact number of features expected by your first fully connected layer, PyTorch will raise a RuntimeError when you try to pass data through. Carefully tracking shapes is vital. A simple yet effective debugging technique is to perform a "dry run"?passing a correctly shaped dummy input through the model and printing the tensor shape after each major step. This can help you catch dimension mismatches early and save hours of troubleshooting. First, let's create an instance of our model and a dummy input tensor representing one grayscale image of the expected size (256×256 pixels) # Create model instance model = PneumoniaCNN() # Create a random dummy grayscale image (batch_size, channels, height, width) dummy_input = torch.randn(1, 1, 256, 256) Now, we can define a helper function that mimics the model's forward pass but includes print statements to show the shape transformations # Forward pass function with shape printing def forward_with_shape_printing(model, x) print(f"Input shape \t\t{x.shape}") # Using tabs for alignment # Pass through convolutional blocks x = model.conv_block1(x) print(f"After conv_block1 \t{x.shape}") x = model.conv_block2(x) print(f"After conv_block2 \t{x.shape}") x = model.conv_block3(x) print(f"After conv_block3 \t{x.shape}") # Flatten the features x = model.flatten(x) print(f"After flatten \t\t{x.shape}") # Pass through fully connected layers (only showing final output shape) x = F.relu(model.fc1(x)) x = model.dropout1(x) x = F.relu(model.fc2(x)) x = model.dropout2(x) logits = model.fc3(x) print(f"Output shape (logits) \t{x.shape}") # Corrected variable name return logits # Run the forward pass (output is ignored with _) print("Running shape verification pass") _ = forward_with_shape_printing(model, dummy_input) Running this code should produce output similar to this Running shape verification pass Input shape torch.Size([1, 1, 256, 256]) After conv_block1 torch.Size([1, 32, 128, 128]) After conv_block2 torch.Size([1, 64, 64, 64]) After conv_block3 torch.Size([1, 128, 32, 32]) After flatten torch.Size([1, 131072]) Output shape (logits) torch.Size([1, 2]) Interpreting the Shape Transformations These printouts confirm several key aspects of our architecture Spatial Dimensions Decrease, Channel Depth Increases Notice how the height and width are halved after each convolutional block (due to the MaxPool2d layer) 256?128?64?32. Simultaneously, the number of channels (features) increases 1?32?64?128. This is the common CNN pattern we discussed earlier, visualized here the network trades spatial resolution for richer feature representation depth, allowing it to capture increasingly complex patterns as data flows deeper. Flattening Connects Blocks The output from the last convolutional block (1×128×32×32) is correctly flattened into a 1D vector of size 1×131072, matching the in_features expected by self.fc1. This confirms our calculation from the previous section and shows the bridge between the convolutional feature extractor and the fully connected classifier head. Interpreting the Final Output Shape ([1, 2]) Finally, let's take a closer look at the output shape torch.Size([1, 2]). The first dimension (1) corresponds to the batch size. We passed in a single dummy image, so the batch size is 1. The second dimension (2) corresponds to the number of classes our model predicts. As established, these are the raw, unnormalized scores (logits) for 'Normal' (index 0) and 'Pneumonia' (index 1). These logits are the direct output suitable for the nn.CrossEntropyLoss function during training. However, to turn them into human-interpretable predictions, two more steps are typically needed (which we'll implement fully in the next tutorial) Convert to Probabilities Apply the softmax function along the class dimension (dim=1) to convert the raw logits into probabilities that sum to 1.0 for each image in the batch. Python # Example Convert logits to probabilities probabilities = F.softmax(logits, dim=1) # probabilities might look like tensor([[0.312, 0.688]]) Get Predicted Class Find the index (0 or 1) corresponding to the highest probability. This index represents the model's final prediction. Python # Example Get the predicted class index _, predicted_class = torch.max(probabilities, dim=1) # predicted_class might look like tensor([1]) (meaning Pneumonia) This shape verification process confirms our model's internal dimensions align correctly and helps clarify how the final output relates to the classification task. Practical Tips for CNN Development Let's explore some important practices to keep in mind when developing CNNs in PyTorch. GPU Usage and Device Management Training CNNs involves a huge number of calculations. While CPUs can perform these operations, Graphics Processing Units (GPUs) are specialized for massive parallel computation, which can make training deep learning models drastically faster, often by an order of magnitude or more! This speed-up is especially noticeable with complex models or large datasets found in many computer vision applications, from analyzing high-resolution photographs to processing video streams or large medical scans. If you have access to a GPU (like NVIDIA GPUs compatible with CUDA), you'll want to leverage its processing power. The key steps are to determine the appropriate device (cuda for NVIDIA GPU or cpu) and then explicitly move both your model and your data tensors to that device before performing operations # 1. Determine the target device (usually done early in your script) device = torch.device("cuda" if torch.cuda.is_available() else "cpu") print(f"Using device {device}") # 2. Move your model to the device (usually done once after creating the model) model = model.to(device) # 3. Move your data tensors to the device (done for EACH batch) images = images.to(device) labels = labels.to(device) In a typical workflow, you'd set the device variable early on. You'd move the model to the device right after creating it (step 2). Importantly, inside your training or evaluation loops, you must also move each batch of images and labels to the same device (step 3) before feeding them into the model. Consistently placing both your model and your input data on the same device is required. Performing operations between tensors residing on different devices (e.g., CPU tensor vs. GPU tensor) is a common source of RuntimeError messages in PyTorch, so diligent device management can save you many headaches. Switching Between Training and Evaluation Modes While we'll cover training our model in the next tutorial, it's good to be reminded that PyTorch models have two operational modes model.train() # Set the model to training mode model.eval() # Set the model to evaluation mode The difference is significant because In training mode, dropout layers randomly disable neurons In evaluation mode, dropout is disabled so all neurons are active Batch normalization behaves differently in each mode This will be especially important when we implement the training loop in the next tutorial, but it's good to be aware of these modes now. Review and Next Steps You've now completed the first step toward building a pneumonia detection system designing an effective CNN architecture in PyTorch. Let's recap what you've learned in this tutorial You understand why CNNs are well-suited for image analysis tasks, such as detecting patterns in X-rays, leveraging their ability to learn spatial hierarchies. You've learned about key CNN components like convolutional layers, pooling layers, batch normalization, and dropout layers. You've implemented a complete CNN model using PyTorch's object-oriented approach You've explored techniques for debugging potential shape issues in your model This is an important foundation, but a model architecture alone can't detect pneumonia. Next, we'll build on this foundation to create a complete working system by Loading and preprocessing real chest X-ray images Implementing training and validation loops Evaluating the model's diagnostic performance Interpreting results and improving the model In the next tutorial, we'll transform this architectural framework into a working pneumonia detection system by adding data processing, training, and evaluation. See you there! Key Takeaways CNNs reduce parameters through local connectivity and weight sharing, making them ideal for image analysis Core CNN components work together to extract increasingly complex features from images PyTorch's object-oriented approach provides a flexible, maintainable framework for implementing CNNs Debugging techniques like shape verification are essential for successful model development Medical applications like pneumonia detection showcase the real-world impact of computer vision


Linking Multiple Asp.net Projects Together
Category: .Net 7

For large applications, let's say Asp.net application that is really large and supports multiple bus ...


Views: 353 Likes: 102
Connect to Another Sql Data Engine Through Interne ...
Category: SQL

I had installed an SQL Data Engine and a Microsoft SQL Server Management Studio on my other computer ...


Views: 297 Likes: 79
Introducing Bash for Beginners
Introducing Bash for Beginners

A new Microsoft video series for developers learning how to script.According to Stack Overflow 2022 Developer Survey, Bash is one of the top 10 most popular technologies. This shouldn't come as a surprise, given the popularity of using Linux systems with the Bash shell readily installed, across many tech stacks and the cloud. On Azure, more than 50 percent of virtual machine (VM) cores run on Linux. There is no better time to learn Bash!Long gone are the days of feeling intimidated by a black screen with text known as a terminal. Say goodbye to blindly typing in “chmod 777” while following a tutorial. Say hello to task automation, scripting fundamentals, programming basics, and your first steps to working with a cloud environment via the bash command line.What we’ll be coveringMy cohost, Josh, and I will walk you through everything you need to get started with Bash in this 20-part series. We will provide an overview of the basics of Bash scripting, starting with how to get help from within the terminal. The terminal is a window that lets you interact with your computer’s operating system, and in this case, the Bash shell. To get help with a specific command, you can use the man command followed by the name of the command you need help with. For example, man ls will provide information on the ls command, which is used for listing directories and finding files.Once you’ve gotten help from within the terminal, you can start navigating the file system. You’ll learn how to list directories and find files, as well as how to work with directories and files themselves. This includes creating, copying, moving, and deleting directories and files. You’ll also learn how to view the contents of a file using the cat command.Another important aspect of Bash is environment variables. These are values that are set by the operating system and are used by different programs and scripts. In Bash, you can access these variables using the “$” symbol followed by the name of the variable. For example, $PATH will give you the value of the PATH environment variable, which specifies the directories where the shell should search for commands.Redirection and pipelines are two other important concepts in Bash. Redirection allows you to control the input and output of a command, while pipelines allow you to chain multiple commands together. For example, you can use the “>” symbol to redirect the output of a command to a file, and the “|” symbol to pipe the output of one command to the input of another.When working with files in Linux, you’ll also need to understand file permissions. In Linux, files have permissions that determine who can access them and what they can do with them. You’ll learn about the different types of permissionssuch as read, write, and execute, and how to change them using the chmod command.Next, we’ll cover some of the basics of Bash scripting. You’ll learn how to create a script, use variables, and work with conditional statements, such as "if" and "if else". You’ll also learn how to use a case statement, which is a way to control the flow of execution based on the value of a variable. Functions are another important aspect of Bash scripting, and you’ll learn how to create and use them to simplify your scripts. Finally, you’ll learn about loops, which allow you to repeat a set of commands multiple times.Why Bash mattersBash is a versatile and powerful language that is widely used. Whether you’re looking to automate tasks, manage files, or work with cloud environments, Bash is a great place to start. With the knowledge you’ll gain from this series, you’ll be well on your way to becoming a proficient Bash scripter.Many other tools like programming languages and command-line interfaces (CLIs) integrate with Bash, so not only is this the beginning of a new skill set, but also a good primer for many others. Want to move on and learn how to become efficient with the Azure CLI? Bash integrates with the Azure CLI seamlessly. Want to learn a language like Python? Learning Bash teaches you the basic programming concepts you need to know such as flow control, conditional logic, and loops with Bash, which makes it easier to pick up Python. Want to have a Linux development environment on your Windows device? Windows Subsystem for Linux (WSL) has you covered and Bash works there, too!While we won't cover absolutely everything there is to Bash, we do make sure to leave you with a solid foundation. At the end of this course, you'll be able to continue on your own following tutorials, docs, books, and other resources. If live is more your style, catch one of our How Linux Works and How to leverage it in the Cloud Series webinars. We'll cover a primer on How Linux Works, discuss How and when to use Linux on Azure, and get your developer environment set up with WSL.This Bash for Beginners series is part of a growing library of video series on the Microsoft Developer channel looking to quickly learn new skills including Python, Java, C#, Rust, JavaScript and more.Learn more about Bash in our Open Source communityNeed help with your learning journey?Watch Bash for Beginners Find Josh and myself on Twitter. Share your questions and progress on our Tech Community, we'll make sure to answer and cheer you on. The post Introducing Bash for Beginners appeared first on Microsoft Open Source Blog.


Can not connect to SQL server in docker container ...
Category: Docker

Problem&nbsp; &nbsp; The challenge was to connect to an SQL Server Instan ...


Views: 2004 Likes: 93
ASP.Net Core 2.2 Not Required Field in Data Model. ...
Category: .Net 7

Question How can I make a Field or Class property in a POCO Data Model not required or Optional ...


Views: 812 Likes: 59

Login to Continue, We will bring you back to this content 0



For peering opportunity Autonomouse System Number: AS401345 Custom Software Development at ErnesTech Email Address[email protected]