Part 4/8:
Matrix Multiplication: The initial step is a matrix multiplication by a large parameter matrix filled with learned weights. This matrix’s rows can be envisioned as directions corresponding to certain features. For instance, if one row aligns with the "first name, Michael," a resulting output suggests a correlation with that embedding.
Non-Linear Activation: To counteract the limitation of linear operations, a non-linear activation function, such as the popular rectified linear unit (ReLU), is applied. This function helps in refining the output by ensuring that only values exceeding a certain threshold contribute to generating the final result.