Search ErnesTech - ErnesTech

NT-Xent (Normalized Temperature-Scaled Cross-Entro ...

An intuitive explanation of the NT-Xent loss with a step-by-step explanation of the operation and our implementation in PyTorchCo-authored with Naresh Singh.Formula for NT-Xent loss. Source Papers with code (CC-BY-SA)IntroductionRecent advances in self-supervised learning and contrastive learning have excited researchers and practitioners in Machine Learning (ML) to explore this space with renewed interest.In particular, the SimCLR paper that presents a simple framework for contrastive learning of visual representations has gained a lot of attention in the self-supervised and contrastive learning space.The central idea behind the paper is very simple?—?allow the model to learn if a pair of images were derived from the same or different initial image.Figure 1 The high-level idea behind SimCLR. Source SimCLR paperThe SimCLR approach encodes each input image i as a feature vector zi. There are 2 cases to considerPositive Pairs The same image is augmented using a different set of augmentations, and the resulting feature vectors zi and zj are compared. These feature vectors are forced to be similar by the loss function.Negative Pairs Different images are augmented using a different set of augmentations, and the resulting feature vectors zi and zk are compared. These feature vectors are forced to be dissimilar by the loss function.The rest of this article will focus on explaining and understanding this loss function, and its efficient implementation using PyTorch.The NT-Xent LossAt a high level, the contrastive learning model is fed 2N images, originating from N underlying images. Each of the N underlying images is augmented using a random set of image augmentations to produce 2 augmented images. This is how we end up with 2N images in a single train batch fed to the model.Figure 2 A batch of 6 images in a single training batch for contrastive learning. The number below each image is the index of that image in the input batch when fed into a contrastive learning model. Image Source Oxford Visual Geometry Group (CC-SA).In the following sections, we will dive deep into the following aspects of the NT-Xent loss.The effect of temperature on SoftMax and SigmoidA simple and intuitive interpretation of the NT-Xent lossA step-by-step implementation of NT-Xent in PyTorchMotivating the need for a multi-label loss function (NT-BXent)A step-by-step implementation of NT-BXent in PyTorchAll the code for steps 2–5 can be found in this notebook. The code for step-1 can be found in this notebook.The effect of temperature on SoftMax and SigmoidTo understand all the moving parts of the contrastive loss function we’ll be studying in this article, we need to first understand the effect of temperature on the SoftMax and Sigmoid activation functions.Typically, temperature scaling is applied to the input to SoftMax or Sigmoid to either smooth out or accentuate the output of those activation functions. The input logits are divided by the temperature before passing into the activation functions. You can find all the code for this section in this notebook.SoftMax For SoftMax, a high temperature reduces the variance in the output distribution which results in softening of the labels. A low temperature increases the variance in the output distribution and makes the maximum value stand out over the other values. See the charts below for the effect of temperature on SoftMax when fed with the input tensor [0.1081, 0.4376, 0.7697, 0.1929, 0.3626, 2.8451].Figure 3 Effect of temperature on SoftMax. Source Author(s)Sigmoid For Sigmoid, a high-temperature results in an output distribution that is pulled towards 0.0, whereas a low temperature stretches the inputs to higher values, stretching the outputs to be closer to either 0.0 or 1.0 depending on the unsigned magnitude of the input.Figure 4 Effect of temperature on Sigmoid. Source Author(s)Now that we understand the effect of various temperature values on the SoftMax and Sigmoid functions, let’s see how this applies to our understanding of the NT-Xent loss.Interpreting the NT-Xent lossThe NT-Xent loss is understood by understanding the individual terms in the name of this loss.Normalized Cosine similarity produces a normalized score in the range [-1.0 to +1.0]Temperature-scaled The all-pairs cosine similarity is scaled by a temperature before computing the cross-entropy lossCross-entropy loss The underlying loss is a multi-class (single-label) cross-entropy lossAs mentioned above, we assume that for a batch of size 2N, the feature vectors at the following indices represent positive pairs (0, 1), (2, 3), (4, 5), (6, 7), … and the rest of the combinations represent negative pairs. This is an important factor to keep in mind throughout the interpretation of the NT-Xent loss as it relates to SimCLR.Now that we understand what the terms mean in the context of the NT-Xent loss, let’s take a look at the mechanical steps needed to compute the NT-Xent loss on a batch of feature vectors.The all-pairs Cosine Similarity score is computed for each of the 2N vectors produced by the SimCLR model. This results in (2N)² similarity scores represented as a 2N x 2N matrixComparison results between the same value (i, i) are discarded (since a distribution is perfectly similar to itself and can’t possibly allow the model to learn anything useful)Each value (cosine similarity) is scaled by a temperature parameter ?? (which is a hyper-parameter)Cross-entropy loss is applied to each row of the resulting matrix above. The following paragraph explains more in detailTypically, the mean of these losses (one loss per element in a batch) is used for backpropagationThe way that the cross-entropy loss is used here is semantically slightly different from how it’s used in standard classification tasks. In classification tasks, a final “classification head” is trained to produce a one-hot-probability vector for each input, and we compute the cross-entropy loss on that one-hot-probability vector since we’re effectively computing the difference between 2 distributions. This video explains the concept of cross-entropy loss beautifully. In the NT-Xent loss, there isn’t a 11 correspondence between a trainable layer and the output distribution. Instead, a feature vector is computed for each input, and we then compute the cosine similarity between every pair of feature vectors. The trick here is that since each image is similar to exactly 1 other image in the input batch (positive pair) (if we ignore the similarity of a feature vector with itself), we can consider this to be a classification-like setting where the probability distribution of the similarity probability between images represents a classification task where one of them will be close to 1.0 and the rest will be close to 0.0.Now that we have a solid overall understanding of the NT-Xent loss, we should be in great shape to implement these ideas in PyTorch. Let’s get going!Implementation of NT-Xent loss in PyTorchAll the code in this section can be found in this notebook.Code Reuse Many implementations of the NT-Xent loss seen online implement all the operations from scratch. Furthermore, some of them implement the loss function inefficiently, preferring to use for loops instead of GPU parallelism. Instead, we will use a different approach. We’ll implement this loss in terms of the standard cross-entropy loss that PyTorch already provides. To do this, we need to massage the predictions and ground-truth labels in a format that cross_entropy can accept. Let’s see how to do this below.Predictions Tensor First, we need to create a PyTorch tensor that will represent the output from our contrastive learning model. Let’s assume that our batch size is 8 (2N=8), and our feature vectors have 2 dimensions (2 values). We’ll call our input variable “x”.x = torch.randn(8, 2)Cosine Similarity Next, we’ll compute the all-pairs cosine similarity between every feature vector in this batch and store the result in the variable named “xcs”. If the line below seems confusing, please read the details on this page. This is the “normalize” step.xcs = F.cosine_similarity(x[None,,], x[,None,], dim=-1)As mentioned above, we need to ignore the self-similarity score of every feature vector since it doesn’t contribute to the model’s learning and will be an unnecessary nuisance later on when we want to compute the cross-entropy loss. For this purpose, we’ll define a variable “eye” which is a matrix with the elements on the principal diagonal having a value of 1.0 and the rest being 0.0. We can create such a matrix using the following command.eye = torch.eye(8)Now let’s convert this into a boolean matrix so that we can index into the “xcs” variable using this mask matrix.eye = eye.bool()Let’s clone the tensor “xcs” into a tensor named “y” so that we can reference the “xcs” tensor later.y = xcs.clone()Now, we will set the values along the principal diagonal of the all-pairs cosine similarity matrix to -inf so that when we compute the softmax on each row, this value will contribute nothing.y[eye] = float("-inf")The tensor “y” scaled by a temperature parameter will be one of the inputs (predictions) to the cross-entropy loss API in PyTorch. Next, we need to compute the ground-truth labels (target) that we need to feed to the cross-entropy loss API.Ground Truth labels (Target tensor) For the example we are using (2N=8), this is what the ground-truth tensor should look like.tensor([1, 0, 3, 2, 5, 4, 7, 6])That’s because the following index pairs in the tensor “y” contain positive pairs.(0, 1), (1, 0)(2, 3), (3, 2)(4, 5), (5, 4)(6, 7), (7, 6)To interpret the index pairs above, we look at a single example. The pair (4, 5) means that column 5 at row 4 is supposed to be set to 1.0 (positive pair), which is what the tensor above is also saying. Great!To create the tensor above, we can use the following PyTorch code, which stores the ground-truth labels in the variable “target”.target = torch.arange(8)target[02] += 1target[12] -= 1cross-entropy Loss We have all the ingredients we need to compute our loss! The only thing that remains to be done is to call the cross_entropy API in PyTorch.loss = F.cross_entropy(y / temperature, target, reduction="mean")The variable “loss” now contains the computed NT-Xent loss. Let’s wrap all the code in a single python function below.def nt_xent_loss(x, temperature) assert len(x.size()) == 2 # Cosine similarity xcs = F.cosine_similarity(x[None,,], x[,None,], dim=-1) xcs[torch.eye(x.size(0)).bool()] = float("-inf") # Ground truth labels target = torch.arange(8) target[02] += 1 target[12] -= 1 # Standard cross-entropy loss return F.cross_entropy(xcs / temperature, target, reduction="mean")The code above works as long as each feature vector has exactly one positive pair in the batch when training our contrastive learning model. Let’s take a look at how to handle multiple positive pairs in a contrastive learning task.A multi-label loss for contrastive learning NT-BXentIn the SimCLR paper, every image i has exactly 1 similar pair at index j. This makes cross-entropy loss a perfect choice for the task since it resembles a multi-class problem. Instead, if we have M > 2 augmentations of the same image fed into the contrastive learning model’s single training batch, then each batch would have image M-1 similar pairs for image i. This task would resemble a multi-label problem.The obvious choice would be to replace cross-entropy loss with binary cross-entropy loss. Hence the name NT-BXent loss, which stands for Normalized Temperature-scaled Binary cross-entropy Loss.The formulation below shows the loss Li for the element i. The s in the formula below stands for the Sigmoid function.Figure 5 Formulation for the NT-BXent loss. Image source Author(s) of this articleTo avoid the class imbalance problem, we weigh the positive and negative pairs by the inverse of the number of positive and negative pairs in our mini-batch. The final loss in the mini-batch used for backpropagation will be the mean of the losses of each sample in our mini-batch.Next, let’s focus our attention on our implementation of the NT-BXent loss in PyTorch.Implementation of NT-BXent loss in PyTorchAll the code in this section can be found in this notebook.Code Reuse Similar to our implementation of the NT-Xent loss, we shall re-use the Binary Cross-entropy (BCE) loss method provided by PyTorch. The setup of our ground-truth labels will be similar to that of a multi-label classification problem where BCE loss is used.Predictions Tensor We’ll use the same (8, 2) predictions tensor as we used for the implementation of the NT-Xent loss.x = torch.randn(8, 2)Cosine Similarity Since the input tensor x is same, the all-pairs cosine similarity tensor xcs will also be the same. Please see this page for a detailed explanation of what the line below does.xcs = F.cosine_similarity(x[None,,], x[,None,], dim=-1)To ensure that the loss from the element at position (i, i) is 0, we’ll need to perform some gymnastics to have our xcs tensor contain a value 1 at every index (i, i) after Sigmoid is applied to it. Since we’ll be using BCE Loss, we will mark the self-similarity score of every feature vector with the value infinity in tensor xcs. That’s because applying the sigmoid function on the xcs tensor, will convert infinity to the value 1, and we will set up our ground-truth labels so that every position (i, i) in the ground-truth labels has the value 1.Let’s create a masking tensor that has the value True along the principal diagonal (xcs has self-similarity scores along the principal diagonal), and False everywhere else.eye = torch.eye(8).bool()Let’s clone the tensor “xcs” into a tensor named “y” so that we can reference the “xcs” tensor later.y = xcs.clone()Now, we will set the values along the principal diagonal of the all-pairs cosine similarity matrix to infinity so that when we compute the sigmoid on each row, we get 1 in these positions.y[eye] = float("inf")The tensor “y” scaled by a temperature parameter will be one of the inputs (predictions) to the BCE loss API in PyTorch. Next, we need to compute the ground-truth labels (target) that we need to feed to the BCE loss API.Ground Truth labels (Target tensor) We will expect the user to pass to us the pair of all (x, y) index pairs which contain positive examples. This is a departure for what we did for the NT-Xent loss, since the positive pairs were implicit, whereas here, the positive pairs are explicit.In addition to the locations provided by the user, we will set all the diagonal elements as positive pairs as explained above. We will use the PyTorch tensor indexing API to pluck out all the elements at those locations and set them to 1, whereas the rest are initialized to 0.target = torch.zeros(8, 8)pos_indices = torch.tensor([ (0, 0), (0, 2), (0, 4), (1, 4), (1, 6), (1, 1), (2, 3), (3, 7), (4, 3), (7, 6),])# Add indexes of the principal diagonal as positive indexes.# This will be useful since we will use the BCELoss in PyTorch,# which will expect a value for the elements on the principal# diagonal as well.pos_indices = torch.cat([pos_indices, torch.arange(8).reshape(8, 1).expand(-1, 2)], dim=0)# Set the values in the target vector to 1.target[pos_indices[,0], pos_indices[,1]] = 1Binary cross-entropy (BCE) Loss Unlike the NT-Xent loss, we can’t simply call the torch.nn.functional.binary_cross_entropy_function, since we want to weigh the positive and negative loss based on how many positive and negative pairs the element at index i has in the current mini-batch.The first step though is to compute the element-wise BCE loss.temperature = 0.1loss = F.binary_cross_entropy((y / temperature).sigmoid(), target, reduction="none")We’ll create a binary mask of positive and negative pairs and then create 2 tensors, loss_pos and loss_neg that contain only those elements from the computed loss that correspond to the positive and negative pairs.target_pos = target.bool()target_neg = ~target_pos# loss_pos and loss_neg below contain non-zero values only for those elements# that are positive pairs and negative pairs respectively.loss_pos = torch.zeros(x.size(0), x.size(0)).masked_scatter(target_pos, loss[target_pos])loss_neg = torch.zeros(x.size(0), x.size(0)).masked_scatter(target_neg, loss[target_neg])Next, we’ll sum up the positive and negative pair loss (separately) corresponding to each element i in our mini-batch.# loss_pos and loss_neg now contain the sum of positive and negative pair losses# as computed relative to the i'th input.loss_pos = loss_pos.sum(dim=1)loss_neg = loss_neg.sum(dim=1)To perform weighting, we need to track the number of positive and negative pairs corresponding to each element i in our mini-batch. Tensors “num_pos” and “num_neg” will store these values.# num_pos and num_neg below contain the number of positive and negative pairs# computed relative to the i'th input. In an actual setting, this number should# be the same for every input element, but we let it vary here for maximum# flexibility.num_pos = target.sum(dim=1)num_neg = target.size(0) - num_posWe have all the ingredients we need to compute our loss! The only thing that we need to do is weigh the positive and negative loss by the number of positive and negative pairs, and then average the loss across the mini-batch.def nt_bxent_loss(x, pos_indices, temperature) assert len(x.size()) == 2 # Add indexes of the principal diagonal elements to pos_indices pos_indices = torch.cat([ pos_indices, torch.arange(x.size(0)).reshape(x.size(0), 1).expand(-1, 2), ], dim=0) # Ground truth labels target = torch.zeros(x.size(0), x.size(0)) target[pos_indices[,0], pos_indices[,1]] = 1.0 # Cosine similarity xcs = F.cosine_similarity(x[None,,], x[,None,], dim=-1) # Set logit of diagonal element to "inf" signifying complete # correlation. sigmoid(inf) = 1.0 so this will work out nicely # when computing the Binary cross-entropy Loss. xcs[torch.eye(x.size(0)).bool()] = float("inf") # Standard binary cross-entropy loss. We use binary_cross_entropy() here and not # binary_cross_entropy_with_logits() because of # https//github.com/pytorch/pytorch/issues/102894 # The method *_with_logits() uses the log-sum-exp-trick, which causes inf and -inf values # to result in a NaN result. loss = F.binary_cross_entropy((xcs / temperature).sigmoid(), target, reduction="none") target_pos = target.bool() target_neg = ~target_pos loss_pos = torch.zeros(x.size(0), x.size(0)).masked_scatter(target_pos, loss[target_pos]) loss_neg = torch.zeros(x.size(0), x.size(0)).masked_scatter(target_neg, loss[target_neg]) loss_pos = loss_pos.sum(dim=1) loss_neg = loss_neg.sum(dim=1) num_pos = target.sum(dim=1) num_neg = x.size(0) - num_pos return ((loss_pos / num_pos) + (loss_neg / num_neg)).mean()pos_indices = torch.tensor([ (0, 0), (0, 2), (0, 4), (1, 4), (1, 6), (1, 1), (2, 3), (3, 7), (4, 3), (7, 6),])for t in (0.01, 0.1, 1.0, 10.0, 20.0) print(f"Temperature {t5.2f}, Loss {nt_bxent_loss(x, pos_indices, temperature=t)}")Prints.Temperature 0.01, Loss 62.898780822753906Temperature 0.10, Loss 4.851151943206787Temperature 1.00, Loss 1.0727109909057617Temperature 10.00, Loss 0.9827173948287964Temperature 20.00, Loss 0.982099175453186ConclusionSelf-supervised learning is an upcoming field in deep learning and allows one to train models on unlabeled data. This technique lets us work around the requirement of labeled data at scale.In this article, we learned about loss functions for contrastive learning. The first one, named NT-Xent loss, is used for learning on a single positive pair per input in a mini-batch. We introduced the NT-BXent loss which is used for learning on multiple (> 1) positive pairs per input in a mini-batch. We learned to interpret them intuitively, building on our knowledge of cross-entropy loss and binary cross-entropy loss. Finally, we implemented them both efficiently in PyTorch.NT-Xent (Normalized Temperature-Scaled Cross-Entropy) Loss Explained and Implemented in PyTorch was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

Decoding the US Senate Hearing on Oversight of AI ...

Photo by Harold Mendoza on UnsplashWord frequency analysis, visualization and sentiment scores using the NLTK toolkitLast Sunday morning, as I was switching TV channels trying to find something to watch while having breakfast, I stumbled upon a replay of the Senate Hearing on Oversight of AI. It had only been 40 minutes since it started, so I decided to watch the rest of it (Talk about an interesting way to spend a Sunday morning!).When events like the Senate Judiciary Subcommittee Hearing on Oversight of AI take place and you want to catch up on the key takeaways, you have four options witness it live, look for future recordings (both options would require three hours of your life); read the written version (transcripts), which are about 79 pages long and over 29,000 words; or read reviews on websites or social media to get different opinions and form your own ( if it’s not from others).Nowadays, with everything moving so quickly and our days feeling too short, it’s tempting to go for the shortcut and rely on reviews instead of going to the original source (I’ve been there too). If you choose the shortcut for this hearing, it’s highly probable that most reviews you’ll find on the web or social media focus on OpenAI CEO Sam Altman’s call for regulating AI. However, after watching the hearing, I felt there was more to explore beyond the headlines.So, after my Sunday funday morning activity, I decided to download the Senate Hearing transcript and use the NLTK Package (a Python package for natural language processing?—?NLP) to analyze it, compare most used words and apply some sentiment scores across different groups of interest (OpenAI, IBM, Academia, Congress) and see what could be between the lines. Spoiler alert! Out of the 29,000 words analyzed, only 70 (0.24%) were related to words like regulation, regulate, regulatory, or legislation.It’s important to note that this article is not about my takeaways from these AI hearing or Mr. ChatGPT Sam Altman. Instead, it focuses on what lies beneath the words of each part of society (Private, Academia, Government) represented in this session under the roof of Capitol Hill, and what we can learn from those words mixing with each other.Considering that the next few months are interesting times for the future of regulation on Artificial Intelligence, as the final draft of the EU AI Act awaits debate in the European Parliament (expected to take place in June), it’s worth exploring what’s behind the discussions surrounding AI on this side of the Atlantic.STEP-01 GET THE DATAI used the transcript published by Justin Hendrix in Tech Policy Press (accessible here).Access the Senate Hearing transcript hereWhile Hendrix mentions it’s a quick transcript and suggests confirming quotes by watching the Senate Hearing video, I still found it to be quite accurate and interesting for this analysis. If you want to watch the Senate Hearing or read the testimonies of Sam Altman (Open AI), Christina Montgomery (IBM), and Gary Marcus (Professor at New York University), you can find them here.Initially, I planned to copy the transcript to a Word document and manually create a table in Excel with the participants’ names, their representing organizations, and their comments. However, this approach was time-consuming and inefficient. So, I turned to Python and uploaded the full transcript from a Microsoft Word file into a data frame. Here is the code I used# STEP 01-Read the Word document# remember to install pip install python-docximport docximport pandas as pddoc = docx.Document('D\....your word file on microsoft word')items = []names = []comments = []# Iterate over paragraphs for paragraph in doc.paragraphs text = paragraph.text.strip() if text.endswith('') name = text[-1] else items.append(len(items)) names.append(name) comments.append(text)dfsenate = pd.DataFrame({'item' items, 'name' names, 'comment' comments})# Remove rows with empty commentsdfsenate = dfsenate[dfsenate['comment'].str.strip().astype(bool)]# Reset the indexdfsenate.reset_index(drop=True, inplace=True)dfsenate['item'] = dfsenate.index + 1print(dfsenate)The output should look like this item name comment0 1 Sen. Richard Blumenthal (D-CT) Now for some introductory remarks.1 2 Sen. Richard Blumenthal (D-CT) “Too often we have seen what happens when technology outpaces regulation, the unbridled exploitation of personal data, the proliferation of disinformation, and the deepening of societal inequalities. We have seen how algorithmic biases can perpetuate discrimination and prejudice, and how the lack of transparency can undermine public trust. This is not the future we want.”2 3 Sen. Richard Blumenthal (D-CT) If you were listening from home, you might have thought that voice was mine and the words from me, but in fact, that voice was not mine. The words were not mine. And the audio was an AI voice cloning software trained on my floor speeches. The remarks were written by ChatGPT when it was asked how I would open this hearing. And you heard just now the result I asked ChatGPT, why did you pick those themes and that content? And it answered. And I’m quoting, Blumenthal has a strong record in advocating for consumer protection and civil rights. He has been vocal about issues such as data privacy and the potential for discrimination in algorithmic decision making. Therefore, the statement emphasizes these aspects.3 4 Sen. Richard Blumenthal (D-CT) Mr. Altman, I appreciate ChatGPT’s endorsement. In all seriousness, this apparent reasoning is pretty impressive. I am sure that we’ll look back in a decade and view ChatGPT and GPT-4 like we do the first cell phone, those big clunky things that we used to carry around. But we recognize that we are on the verge, really, of a new era. The audio and my playing, it may strike you as curious or humorous, but what reverberated in my mind was what if I had asked it? And what if it had provided an endorsement of Ukraine, surrendering or Vladimir Putin’s leadership? That would’ve been really frightening. And the prospect is more than a little scary to use the word, Mr. Altman, you have used yourself, and I think you have been very constructive in calling attention to the pitfalls as well as the promise.4 5 Sen. Richard Blumenthal (D-CT) And that’s the reason why we wanted you to be here today. And we thank you and our other witnesses for joining us for several months. Now, the public has been fascinated with GPT, dally and other AI tools. These examples like the homework done by ChatGPT or the articles and op-eds, that it can write feel like novelties. But the underlying advancement of this era are more than just research experiments. They are no longer fantasies of science fiction. They are real and present the promises of curing cancer or developing new understandings of physics and biology or modeling climate and weather. All very encouraging and hopeful. But we also know the potential harms and we’ve seen them already weaponized disinformation, housing discrimination, harassment of women and impersonation, fraud, voice cloning deep fakes. These are the potential risks despite the other rewards. And for me, perhaps the biggest nightmare is the looming new industrial revolution. The displacement of millions of workers, the loss of huge numbers of jobs, the need to prepare for this new industrial revolution in skill training and relocation that may be required. And already industry leaders are calling attention to those challenges.5 6 Sen. Richard Blumenthal (D-CT) To quote ChatGPT, this is not necessarily the future that we want. We need to maximize the good over the bad. Congress has a choice. Now. We had the same choice when we face social media. We failed to seize that moment. The result is predators on the internet, toxic content exploiting children, creating dangers for them. And Senator Blackburn and I and others like Senator Durbin on the Judiciary Committee are trying to deal with it in the Kids Online Safety Act. But Congress failed to meet the moment on social media. Now we have the obligation to do it on AI before the threats and the risks become real. Sensible safeguards are not in opposition to innovation. Accountability is not a burden far from it. They are the foundation of how we can move ahead while protecting public trust. They are how we can lead the world in technology and science, but also in promoting our democratic values.6 7 Sen. Richard Blumenthal (D-CT) Otherwise, in the absence of that trust, I think we may well lose both. These are sophisticated technologies, but there are basic expectations common in our law. We can start with transparency. AI companies ought to be required to test their systems, disclose known risks, and allow independent researcher access. We can establish scorecards and nutrition labels to encourage competition based on safety and trustworthiness, limitations on use. There are places where the risk of AI is so extreme that we ought to restrict or even ban their use, especially when it comes to commercial invasions of privacy for profit and decisions that affect people’s livelihoods. And of course, accountability, reliability. When AI companies and their clients cause harm, they should be held liable. We should not repeat our past mistakes, for example, Section 230, forcing companies to think ahead and be responsible for the ramifications of their business decisions can be the most powerful tool of all. Garbage in, garbage out. The principle still applies. We ought to beware of the garbage, whether it’s going into these platforms or coming out of them.Next, I considered adding some labels for future analyis, identifying the individuals by the segment of society they representeddef assign_sector(name) if name in ['Sam Altman', 'Christina Montgomery'] return 'Private' elif name == 'Gary Marcus' return 'Academia' else return 'Congress'# Apply function dfsenate['sector'] = dfsenate['name'].apply(assign_sector)# Assign organizations based on namesdef assign_organization(name) if name == 'Sam Altman' return 'OpenAI' elif name == 'Christina Montgomery' return 'IBM' elif name == 'Gary Marcus' return 'Academia' else return 'Congress'# Apply functiondfsenate['Organization'] = dfsenate['name'].apply(assign_organization)print(dfsenate)Finally, I decided to add a column that counts the words from each statement, which could help us also for further analysis.dfsenate['WordCount'] = dfsenate['comment'].apply(lambda x len(x.split()))At this part, your dataframe should look like this item name ... Organization WordCount0 1 Sen. Richard Blumenthal (D-CT) ... Congress 51 2 Sen. Richard Blumenthal (D-CT) ... Congress 552 3 Sen. Richard Blumenthal (D-CT) ... Congress 1253 4 Sen. Richard Blumenthal (D-CT) ... Congress 1454 5 Sen. Richard Blumenthal (D-CT) ... Congress 197.. ... ... ... ... ...399 400 Sen. Cory Booker (D-NJ) ... Congress 156400 401 Sam Altman ... OpenAI 180401 402 Sen. Cory Booker (D-NJ) ... Congress 72402 403 Sen. Richard Blumenthal (D-CT) ... Congress 154403 404 Sen. Richard Blumenthal (D-CT) ... Congress 98STEP-02 VISUALIZE THE DATALet’s take a look at the numbers we have so far 404 questions or testimonies and almost 29,000 words. These numbers give us the material we need to get started. It’s important to know that some statements were split into smaller parts. When there were long statements with different paragraphs, the code divided them into separate statements, even though they were actually part of one contribution. To get a better understanding of each participant’s involvement, I also consider the number of words they used. This gave another perspective on their engagement.Hearing on Oversight of AI Figure 01As you can see in Figure 01, interventions by members of Congress represented more than half of all the hearings, followed by Sam Altman’s testimony. However, an alternate view obtained by counting the words from each side shows a more balanced representation between Congress (11 members) and the panel composed of Altman (OpenAI), Montgomery (IBM), and Marcus (Academia).It’s interesting to note the different levels of engagement among the members of Congress who participated in the Senate hearing (View table below) . As expected, Sen. Blumenthal, as the Subcommittee Chair, was highly engaged. But what about the other members? The table shows significant variations in engagement among all eleven participants. Remember, the quantity of contributions doesn’t necessarily indicate their quality. I’ll let you do your own judgement while you review the numbers.Lastly, even though Sam Altman received a lot of attention, it’s worth noting that Gary Marcus, despite it may appear that he had few participation opportunities, had a lot to say, as indicated by his word count, which is similar to Altman’s. Or is it maybe because academia often provides detailed explanations, while the business world prefers practicality and straightforwardness?Alright, professor Marcus, if you could be specific. This is your shot, man. Talk in plain English and tell me what, if any rules we ought to implement. And please don’t just use concepts. I’m looking for specificity.Sen. John Kennedy (R-LA). US Senate Hearing on Oversight of AI ( 2023)#*****************************PIE CHARTS************************************import pandas as pdimport matplotlib.pyplot as plt# Pie chart - Grouping by 'Organization' Questions&Testimoniesorg_colors = {'Congress' '#6BB6FF', 'OpenAI' 'green', 'IBM' 'lightblue', 'Academia' 'lightyellow'}org_counts = dfsenate['Organization'].value_counts()plt.figure(figsize=(8, 6))patches, text, autotext = plt.pie(org_counts.values, labels=org_counts.index, autopct=lambda p f'{p.1f}%({int(p * sum(org_counts.values) / 100)})', startangle=90, colors=[org_colors.get(org, 'gray') for org in org_counts.index])plt.title('Hearing on Oversight of AI Questions or Testimonies')plt.axis('equal')plt.setp(text, fontsize=12)plt.setp(autotext, fontsize=12)plt.show()# Pie chart - Grouping by 'Organization' (WordCount)org_colors = {'Congress' '#6BB6FF', 'OpenAI' 'green', 'IBM' 'lightblue', 'Academia' 'lightyellow'}org_wordcount = dfsenate.groupby('Organization')['WordCount'].sum()plt.figure(figsize=(8, 6))patches, text, autotext = plt.pie(org_wordcount.values, labels=org_wordcount.index, autopct=lambda p f'{p.1f}%({int(p * sum(org_wordcount.values) / 100)})', startangle=90, colors=[org_colors.get(org, 'gray') for org in org_wordcount.index])plt.title('Hearing on Oversight of AI WordCount ')plt.axis('equal')plt.setp(text, fontsize=12)plt.setp(autotext, fontsize=12)plt.show()#************Engagement among the members of Congress**********************# Group by name and count the rowsSummary_Name = dfsenate.groupby('name').agg(comment_count=('comment', 'size')).reset_index()# WordCount column for each nameSummary_Name ['Total_Words'] = dfsenate.groupby('name')['WordCount'].sum().values# Percentage distribution for comment_countSummary_Name ['comment_count_%'] = Summary_Name['comment_count'] / Summary_Name['comment_count'].sum() * 100# Percentage distribution for total_word_countSummary_Name ['Word_count_%'] = Summary_Name['Total_Words'] / Summary_Name['Total_Words'].sum() * 100Summary_Name = Summary_Name.sort_values('Total_Words', ascending=False)print (Summary_Name)+-------+--------------------------------+---------------+-------------+-----------------+--------------+| index | name | Interventions | Total_Words | Interv_% | Word_count_% |+-------+--------------------------------+---------------+-------------+-----------------+--------------+| 2 | Sam Altman | 92 | 6355 | 22.77227723 | 22.32252626 || 1 | Gary Marcus | 47 | 5105 | 11.63366337 | 17.93178545 || 15 | Sen. Richard Blumenthal (D-CT) | 58 | 3283 | 14.35643564 | 11.53184165 || 10 | Sen. Josh Hawley (R-MO) | 25 | 2283 | 6.188118812 | 8.019249008 || 0 | Christina Montgomery | 36 | 2162 | 8.910891089 | 7.594225298 || 6 | Sen. Cory Booker (D-NJ) | 20 | 1688 | 4.95049505 | 5.929256384 || 7 | Sen. Dick Durbin (D-IL) | 8 | 1143 | 1.98019802 | 4.014893393 || 11 | Sen. Lindsey Graham (R-SC) | 32 | 880 | 7.920792079 | 3.091081527 || 5 | Sen. Christopher Coons (D-CT) | 6 | 869 | 1.485148515 | 3.052443008 || 12 | Sen. Marsha Blackburn (R-TN) | 14 | 869 | 3.465346535 | 3.052443008 || 4 | Sen. Amy Klobuchar (D-MN) | 11 | 769 | 2.722772277 | 2.701183744 || 13 | Sen. Mazie Hirono (D-HI) | 7 | 755 | 1.732673267 | 2.652007447 || 14 | Sen. Peter Welch (D-VT) | 11 | 704 | 2.722772277 | 2.472865222 || 3 | Sen. Alex Padilla (D-CA) | 7 | 656 | 1.732673267 | 2.304260775 |+-------+--------------------------------+---------------+-------------+-----------------+--------------+STEP-03 TOKENIZATIONHere is where the natural language processing (NLP) fun begins. To analyze the text, we’ll use the NLTK Package in Python. It provides useful tools for word frequency analysis and visualization. The following libraries and modules would provide the necessary tools for word frequency analysis and visualization.#pip install nltk#pip install spacy#pip install wordcloud#pip install subprocess#python -m spacy download enFirst, we’ll start with Tokenization, which means breaking the text into individual words, also known as “tokens.” For this, we’ll use spaCy, an open-source NLP library that can handle contractions, punctuation, and special characters. Next, we’ll remove common words that don’t add much meaning, like “a,” “an,” “the,” “is,” and “and,” using the stop word resource from the NLTK library. Finally, we’ll apply Lemmatization which reduces words to their base form, known as the lemma. For example, “running” becomes “run” and “happier” becomes “happy.” This technique helps us work with the text more effectively and understand its meaning.To summarizeo Tokenize the text.o Remove common words.o Apply Lemmatization.#***************************WORD-FRECUENCY*******************************import subprocessimport nltkimport spacyfrom nltk.probability import FreqDistfrom nltk.corpus import stopwords# Download resourcessubprocess.run('python -m spacy download en', shell=True)nltk.download('punkt')# Load spaCy model and set stopwordsnlp = spacy.load('en_core_web_sm')stop_words = set(stopwords.words('english'))def preprocess_text(text) words = nltk.word_tokenize(text) words = [word.lower() for word in words if word.isalpha()] words = [word for word in words if word not in stop_words] lemmas = [token.lemma_ for token in nlp(" ".join(words))] return lemmas# Aggregate words and create Frecuency Distributionall_comments = ' '.join(dfsenate['comment'])processed_comments = preprocess_text(all_comments)fdist = FreqDist(processed_comments)#**********************HEARING TOP 30 COMMON WORDS*********************import matplotlib.pyplot as pltimport numpy as np# Most common words and their frequenciestop_words = fdist.most_common(30)words = [word for word, freq in top_words]frequencies = [freq for word, freq in top_words]# Bar plot-Hearing on Oversight of AITop 30 Most Common Wordsfig, ax = plt.subplots(figsize=(8, 10))ax.barh(range(len(words)), frequencies, align='center', color='skyblue')ax.invert_yaxis()ax.set_xlabel('Frequency', fontsize=12)ax.set_ylabel('Words', fontsize=12)ax.set_title('Hearing on Oversight of AITop 30 Most Common Words', fontsize=14)ax.set_yticks(range(len(words)))ax.set_yticklabels(words, fontsize=10)ax.spines['right'].set_visible(False)ax.spines['top'].set_visible(False)ax.spines['left'].set_linewidth(0.5)ax.spines['bottom'].set_linewidth(0.5)ax.tick_params(axis='x', labelsize=10)plt.subplots_adjust(left=0.3)for i, freq in enumerate(frequencies) ax.text(freq + 5, i, str(freq), va='center', fontsize=8)plt.show()Hearing on Oversight of AI Figure 02As you can see in the bar plot (Figur 02) , there was a lot of “Thinking”. Maybe the first five words give us an interesting hint of what we should do today and for our future in terms of AI“We need to think and know where AI should go”.As I mentioned at the beginning of this article, at first sight, “regulation” doesn’t stand out as a frequently used word in the Senate AI Hearing. However, concluding that it wasn’t a topic of main concern could be inaccurate . The interest in whether AI should or should not be regulated was expressed in different words such as “regulation”, “regulate”, “agency” or “regulatory”. Therefore, lets make some adjustments to the code, aggregate these words, and re-run the bar plot to see how it impacts the analysis.nlp = spacy.load('en_core_web_sm')stop_words = set(stopwords.words('english'))def preprocess_text(text) words = nltk.word_tokenize(text) words = [word.lower() for word in words if word.isalpha()] words = [word for word in words if word not in stop_words] lemmas = [token.lemma_ for token in nlp(" ".join(words))] return lemmas# Aggregate words and create Frecuency Distributionall_comments = ' '.join(dfsenate['comment'])processed_comments = preprocess_text(all_comments)fdist = FreqDist(processed_comments)original_fdist = fdist.copy() # Save the original objectaggregate_words = ['regulation', 'regulate','agency', 'regulatory','legislation']aggregate_freq = sum(fdist[word] for word in aggregate_words)df_aggregatereg = pd.DataFrame({'Word' aggregate_words, 'Frequency' [fdist[word] for word in aggregate_words]})# Remove individual words and add aggregationfor word in aggregate_words del fdist[word]fdist['regulation+agency'] = aggregate_freq# Pie chart for Regulation+agency distributionimport matplotlib.pyplot as pltlabels = df_aggregatereg['Word']values = df_aggregatereg['Frequency']plt.figure(figsize=(8, 6))plt.subplots_adjust(top=0.8, bottom=0.25) patches, text, autotext = plt.pie(values, labels=labels, autopct=lambda p f'{p.1f}%({int(p * sum(values) / 100)})', startangle=90, colors=['#6BB6FF', 'green', 'lightblue', 'lightyellow', 'gray'])plt.title('Regulation+agency Distribution', fontsize=14)plt.axis('equal')plt.setp(text, fontsize=8) plt.setp(autotext, fontsize=8) plt.show()Hearing on Oversight of AI Figure 03As you can see in Figure-03, the topic of regulation was after all many times during the Senate AI Hearing.STEP-04 WHAT HIDES BEHIND THE WORDSWords alone may provide us with some clues, but it is the interconnection of words that truly offers us some perspective. So, let’s take an approach using word clouds to explore if we can discover insights that cannot be shown by simple bar and pie charts.# Word cloud-Senate Hearing on Oversight of AIfrom wordcloud import WordCloudwordcloud = WordCloud(width=800, height=400, background_color='white').generate_from_frequencies(fdist)plt.figure(figsize=(10, 5))plt.imshow(wordcloud, interpolation='bilinear')plt.axis('off')plt.title('Word Cloud - Senate Hearing on Oversight of AI')plt.show()Hearing on Oversight of AI Figure 04Let’s explore further and compare the word clouds for the different groups of interest represented in the AI Hearing (Private, Congress, Academia) and see if they words reveal different perspectives on the future of AI.# Word clouds for each group of Interestorganizations = dfsenate['Organization'].unique()for organization in organizations comments = dfsenate[dfsenate['Organization'] == organization]['comment'] all_comments = ' '.join(comments) processed_comments = preprocess_text(all_comments) fdist_organization = FreqDist(processed_comments) # Word clouds wordcloud = WordCloud(width=800, height=400, background_color='white').generate_from_frequencies(fdist_organization) plt.figure(figsize=(10, 5)) plt.imshow(wordcloud, interpolation='bilinear') plt.axis('off') if organization == 'IBM' plt.title(f'Word Cloud {organization} - Christina Montgomery') elif organization == 'OpenAI' plt.title(f'Word Cloud {organization} - Sam Altman') elif organization == 'Academia' plt.title(f'Word Cloud {organization} - Gary Marcus') else plt.title(f'Word Cloud {organization}') plt.show()Hearing on Oversight of AI Figure 05It’s interesting how some words appear (or disappear) for each group of interest represented in the Senate AI Hearing while they talk about artificial intelligence.In terms of the big heading, “Sam Altman’s call for regulating AI” ; well, if he is in favor of regulation or not, I really can’t tell, but it doesn’t seem to have much regulation in its words to me.Instead, Sam Altman seems to have a people-centric approach when he talks about AI, repeating words like “think,” “people,” “know,” “important,” and “use,” and relies more on words like “technology” ,”system” or “model” instead of using the word “AI”.Someone that did had something to say about “risk”, and “issues” was Christina Montgomery (IBM) who repeated this words constantly when talking about “technology”, “companies” and “AI”. Interesting fact in her testimony, is finding words that most of all expect to hear from companies involved in developing technology ; “trust”, “governance” and “think” what it’s “right” in terms of AI.“We need to hold companies responsible today and accountable for AI that they’re deploying…..”Christina Montgomery. US Senate Hearing on Oversight of AI ( 2023)Gary Marcus in his initial statement said, ‘“I come as a scientist, someone who’s founded AI companies, and is someone who genuinely loves AI…” So, for the sake of this NLP analysis, we are considering him as a representation of the voice of Academia. Words like “need”, “think”, “know”, “go” , “people” stand out among others. An interesting fact is that the word “system” seems to be repeated more than “AI” in his testimony. Maybe AI it’s not a single lone technology that would change the future, the impact on the future will come from multiple technologies or systems interacting with each other (IoT, robotics, BioTech, etc.) rather than relying solely on one of them.At the end, the first hypothesis mentioned by Senator John Kennedy seems not entirely false after all (not just for Congress but for society as a whole). We are still in that stage where we are trying to understand the direction AI is heading.“Permit me to share with you three hypotheses that I would like you to assume for the moment to be true. Hypothesis number one, many members of Congress do not understand artificial intelligence. Hypothesis. Number two, that absence of understanding may not prevent Congress from plunging in with enthusiasm and trying to regulate this technology in a way that could hurt this technology. Hypothesis number three, that I would like you to assume there is likely a berserk wing of the artificial intelligence community that intentionally or unintentionally could use artificial intelligence to kill all of us and hurt us the entire time that we are dying…..”Sen. John Kennedy (R-LA). US Senate Hearing on Oversight of AI ( 2023)STEP-05 THE EMOTION BEHIND YOUR WORDSWe’ll use the SentimentIntensityAnalyzer class from the NLTK library for sentiment analysis. This pre-trained model uses a lexicon-based approach, where each word in the lexicon (VADER) has a predefined sentiment polarity value. The sentiment scores of the words in a piece of text are aggregated to calculate an overall sentiment score. The numerical value ranges from -1 (negative sentiment) to +1 (positive sentiment), with 0 indicating a neutral sentiment. Positive sentiment reflects a favorable emotion, attitude, or enthusiasm, while negative sentiment conveys an unfavorable emotion or attitude.#************SENTIMENT ANALYSIS************from nltk.sentiment import SentimentIntensityAnalyzernltk.download('vader_lexicon')sid = SentimentIntensityAnalyzer()dfsenate['Sentiment'] = dfsenate['comment'].apply(lambda x sid.polarity_scores(x)['compound'])#************BOXPLOT-GROUP OF INTEREST************import seaborn as snsimport matplotlib.pyplot as pltsns.set_style('white')plt.figure(figsize=(12, 7))sns.boxplot(x='Sentiment', y='Organization', data=dfsenate, color='yellow', width=0.6, showmeans=True, showfliers=True)# Customize the axis def add_cosmetics(title='Sentiment Analysis Distribution by Group of Interest', xlabel='Sentiment') plt.title(title, fontsize=28) plt.xlabel(xlabel, fontsize=20) plt.xticks(fontsize=15) plt.yticks(fontsize=15) sns.despine()def customize_labels(label) if "OpenAI" in label return label + "-Sam Altman" elif "IBM" in label return label + "-Christina Montgomery" elif "Academia" in label return label + "-Gary Marcus" else return label# Apply customized labels to y-axisyticks = plt.yticks()[1]plt.yticks(ticks=plt.yticks()[0], labels=[customize_labels(label.get_text()) for label in yticks])add_cosmetics()plt.show()Hearing on Oversight of AI Figure 06A boxplot is always interesting as it shows the minimum and maximum values, the median, the first (Q1) and third (Q3) quartiles. In addition, a line of code was added to display the mean value. (Acknowledgment to Elena Kosourova for designing the boxplot code template; I only made adjustments for my dataset).Overall, everyone seemed to be in a good mood during the Senate Hearing, especially Sam Altman, who stood out with the highest sentiment score, followed by Christina Montgomery. On the other hand, Gary Marcus seemed to have a more neutral experience (median around 0.25) and he may have felt somewhat uncomfortable at times, with values close to 0 or even negative. In addition, Congress as a whole displayed a left-skewed distribution in its sentiment scores, indicating a tendency towards neutrality or positivity. Interestingly, if we take a closer look, certain interventions stood out with extremely high or low sentiment scores.Hearing on Oversight of AI Figure 07Maybe we should interpret the results not as if people in the Senate AIHearing were happy or uncomfortable. Maybe this suggest that those who participate in the Hearing may not hold an overly optimistic view of where AI is headed, but at the same time, they are not pessimistic either. The scores may indicate that there are some concerns and are being cautious about the direction AI should take.And what about a timeline? Did the mood during the hearing stay the same throughout? How did the mood of each group of interest evolve? To analyze the timeline, I organized the statements in the order they were captured and conducted a sentiment analysis. Since there are over 400 questions or testimonies, I defined a moving average of the sentiment scores for each group of interest ( Congress, Academia, Private) , using a window size of 10. This means that the moving average is calculated by averaging the sentiment scores over every 10 consecutive statements#**************************TIMELINE US SENATE AI HEARING**************************************import seaborn as snsimport matplotlib.pyplot as pltimport numpy as npfrom scipy.interpolate import make_interp_spline# Moving average for each organizationwindow_size = 10 organizations = dfsenate['Organization'].unique()# Create the line plotcolor_palette = sns.color_palette('Set2', len(organizations))plt.figure(figsize=(12, 6))for i, org in enumerate(organizations) df_org = dfsenate[dfsenate['Organization'] == org] # moving average df_org['Sentiment'].fillna(0, inplace=True) # missing values filled with 0 df_org['Moving_Average'] = df_org['Sentiment'].rolling(window=window_size, min_periods=1).mean() x = np.linspace(df_org.index.min(), df_org.index.max(), 500) spl = make_interp_spline(df_org.index, df_org['Moving_Average'], k=3) y = spl(x) plt.plot(x, y, linewidth=2, label=f'{org} {window_size}-Point Moving Average', color=color_palette[i])plt.xlabel('Statement Number', fontsize=12)plt.ylabel('Sentiment Score', fontsize=12)plt.title('Sentiment Score Evolution during the Hearing on Oversight of AI', fontsize=16)plt.legend(fontsize=12)plt.grid(color='lightgray', linestyle='--', linewidth=0.5)plt.axhline(0, color='black', linewidth=0.5, alpha=0.5)for org in organizations df_org = dfsenate[dfsenate['Organization'] == org] plt.text(df_org.index[-1], df_org['Moving_Average'].iloc[-1], f'{df_org["Moving_Average"].iloc[-1].2f}', ha='right', va='top', fontsize=12, color='black')plt.tight_layout()plt.show()Hearing on Oversight of AI Figure 08At the beginning, it seemed like the session was friendly and optimistic, with everyone discussing the future of AI. But as the session went on, the mood started to change. The members of Congress became less optimistic, and their questions became more challenging. This affected the panelists’ scores, with some even getting low scores (you can see this towards the end of the session). Interestingly, Altman was seen by the model as neutral or slightly positive, even during the tense moments with the members of Congress.It’s important to remember that the model has its limitations and could border on subjectivity. While sentiment analysis isn’t flawless, it offers us an interesting glimpse into the intensity of emotions that prevailed on that day in Capitol Hill.Final thoughtIn my opinion, the lessons behind this US Senate AI Hearing lie in the five most repeated words “We need to think and know where AI should go”. It is noteworthy that words like “people” and “importance” were unexpectedly present in Sam Altman’s word cloud, going beyond the headline for a “Call for regulation”. While I hoped to find more words like “transparency”, “accountability”, “trust”, “governance”, and “fairness” in Altman’s NLP analysis, it was a relief to find some of them frequently repeated in Christina Montgomery’s testimony. This is what we are all expecting to hear more frequently when AI is on the table.Gary Marcus emphasized “system” as much as “AI”, perhaps inviting us to see Artificial Intelligence in a broader context. Multiple technologies are emerging right now, and their combined impact on society, work, and employment in the future will come from the clash of these multiple technologies, not just from one of them. Academia plays a vital role in guiding this path, and if some kind of regulation is needed.I say this “literally” not “spiritually” (inside joke from the six-month moratorium letter).Finally, the word “Agency” was repeated as much as “Regulation” in its different forms. This suggests that the concept of an “Agency for AI” and its role will likely be a topic of debate in the near future. An interesting reflection on this challenge was mentioned in the Senate AI Hearing by Sen. Richard Blumenthal“…Most of my career has been an enforcement. And I will tell you something, you can create 10 new agencies, but if you don’t give them the resources, and I’m talking not just about dollars, I’m talking about scientific expertise, you guys will run circles around ’em. And it isn’t just the, the models or the generative AI that will run models around run circles around them, but it is the scientists in your companies. For every success story in government regulation, you can think of five failures…. And I hope our experience here will be different…”Sen. Richard Blumenthal (D-CT). US Senate Hearing on Oversight of AI ( 2023)Although reconciling innovation, awareness, and regulation for me is challenging, I am all for raising awareness about AI’s role in our present and future but also understanding that “research” and “development” are different things. The first one should be encouraged and promoted, not contained,the second one is where the extra effort in the “thinking” and “knowing” is needed.I hope you found this NLP analysis interesting and I want to thank Justin Hendrix and Tech Policy Press for allowing me to use their transcript in this article. You can access the complete code in this GitHub repository. (Acknowledgement also to ChatGPT for helping me fine-tune some of my code for a better presentation).Did I miss anything? Your suggestions are always welcome and keep the conversation going.Decoding the US Senate Hearing on Oversight of AI NLP Analysis in Python was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

Oculus is launching multi-user accounts and app sh ...

Wieso die Sharing-Economy scheitern könnte

Stop Telling Women Not To Share About Their Early ...

Comparing Sharing Economy International (OTCMKTSS ...

Report Google sharing Chrome iOS search revenue w ...

Desksharing oder sogar mehr Platz – wie sehen Star ...

Educative subscription sharing

The Netflix & Chill Era Is Officially Over

People Are Sharing Their Best Travel Tips, And Som ...

What are the best top 10 website for Creative Comm ...

Chapter 5: Self-Care and Coping Strategies

Netflix Users Are Skewering The Streamer With A Po ...

People In Relationships – Are These Things Gross O ...

How to Connect to Linux Ubuntu 20 using Remote Des ...

So wird Desksharing nicht zum Reinfall

Ihr braucht Bilder für euren Shop? Diese Gründerin ...

Netflix is finally trying to crack down on passwor ...

San Diego’s spying streetlights stuck switched “on ...

"I Had Never Suspected A Thing" People Are Sharin ...

Facebook will now push you to read articles before ...

Bike-Sharing Neben der Spur

Parents Are Sharing Why They Regret Having Kids, A ...

Critical Analysis Sharing Economy International ( ...

Airline Workers Are Sharing Crazy Things That Happ ...

Netflix Has Outlined How It Plans To Stop Password ...

Lastenrad-Sharing So funktionieren die Geschäftsm ...

Netflix lays down the law, cracks down on password ...

15 Celebrity Couples Who've Been Together Since Co ...

The latest on Messages, Allo, Duo and Hangouts

NT-Xent (Normalized Temperature-Scaled Cross-Entro ...

Anomaly Detection Using Sigma Rules Build Your Ow ...

Leaked Super League plans reveal goal of US-style ...

Intuitive People Are Revealing The Scariest Moment ...

Mit ihrem Anhänger-Sharing wollen sie eine konserv ...

Telegram emerges as new dark web for cyber crimina ...

People Are Sharing Exactly How They Caught Their ...

Facebook will now shame pages that share misinform ...

Cordova App access Denied

Sam Claflin reveals he went ‘too far’ in bid to wi ...

Decoding the US Senate Hearing on Oversight of AI ...

Your Instagram inbox probably looks different toda ...

Boombranche Warum Lastenrad-Sharing das nächste g ...

Netflix testing new features to stop password shar ...

Software Best Practices Learned by Experience

Stop screenshotting, Twitter can now share your tw ...

Can the military coup in Sudan be reversed?

Why Open Source Libraries are the Future of Softwa ...

Netflix Users Are Skewering The Streamer With A Po ...

Chapter 6: Building a Supportive Community

Stephen King Is Shading A Mystery Streaming Servic ...

Porter (YC W22) Is Hiring a UX/Product Engineer

US Netflix account sharers won't have to pay up – ...

Report Google sharing Chrome iOS search revenue w ...

Sharing Economy International (OTCMKTSSEII) &#038 ...

19 Pretty Cool Charts To Make You Say, "Hmm...I Di ...

When does sharing become oversharing?

Exploring Research Frontiers in Lipedema

foambubble/foam