Link Archive
t’s 2021 and so I don’t need to tell you that having your API pass a username and password through HTTP basic authentication is a bad idea. Your tokens should look large and random, whatever they are.
The Diátaxis framework aims to solve the problem of structure in technical documentation. It adopts a systematic approach to understanding the needs of documentation users in their cycle of interaction with a product.
The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks are usually required to build more advanced text processing services. OpenNLP also includes maximum entropy and perceptron based machine learning.
Group thousands of similar spreadsheet text cells in seconds String matching in Python with TF-IDF and Cosine Similarity
Bag of Words In its simplest form, BOW is a list of distinct words in a document and a word count for each word. BOW is a simple model to represent text as a numerical structure. Consider the term “document” to be any text you can access regardless of the format, from text in a word document to just a standalone string variable. I leave it up to you to extract the text from whatever format you are working with.
Scipy offers variety of sparse matrices functions that store only non-zero elements. By doing so, memory required for data storage can be minimized. Machine learning process often requires data frame to be in memory. It breaks down the data frame for fitting into RAM. By compressing, data can easily fit in RAM. Performing operations using only non-zero values of the sparse matrix can greatly increase execution speed of the algorithm.
Comparing very large feature vectors and picking the best matches, in practice often results in performing a sparse matrix multiplication followed by selecting the top-n multiplication results. In this blog, we implement a customized Cython function for this purpose. When comparing our Cythonic approach to doing the same with SciPy and NumPy functions, our approach improves the speed by about 40% and reduces memory consumption. The GitHub code of our approach is available here.