Some time ago i wrote a post about web crawling using google´s api (See here). However, it lacks of HTML labels recognition support and it becomes tedious to find key components on web pages.
In this post, i will try to show you how to successfully recognize web page’s key HTML labels such as title, div, etc using a library named BeautifulSoup using the programming language Python. For this reason, we need to have basic HTML and python knowledge. For experiment purposes i will be using the native python installation on OSX 10.11.5 “El Capitan”. Continue reading
In the last post i tried to explain what a finite state machine is (see here). In this post, we’ll talk about regular languages and the equivalence between deterministic finite state machines, often called deterministic finite automata (DFA), and nondeterministic finite state machines, often called nondeterministic finite automata (NDFA).
One might think that NDFAs can solve problems that DFAs cannot, but NDFAs are just as powerful as DFAs. However, a DFA will require many more states and transitions than a NDFA would take to solve the same problem. Continue reading
What is a Finite State Machine?
The simplest type of computing machine that is worth considering is called a ‘finite state machine’. Finite state machines may sound like a very dry and boring topic but they reveal a lot about the power of different types of computing machine. Every Turing machine includes a finite state machine so there is a sense in which they come first. They also turn out to be very useful in practice.
A finite state machine (sometimes called a finite state automaton) is a computation model that can be implemented with hardware or software and can be used to simulate sequential logic and some computer programs. Finite state automata generate regular languages. A regular language is a language that can be expressed with a regular expression or a deterministic or non-deterministic finite automata or state machine. A language is a set of strings which are made up of characters from a specified alphabet, or set of symbols. Regular languages are a subset of the set of all strings. Regular languages are used in parsing and designing programming languages and are one of the first concepts taught in computability courses.
Over the last days, i’ve dealing with the challenge of develop some database functions whose need to process some million rows to analyze the whole dataset. As a basic tool, a store procedure may save you several time and a lot of effort.
What is a stored procedure?
A stored procedure is a procedure (like a subprogram in a regular computing language) that is stored (in the database). Correctly speaking, MySQL supports “routines” and there are two kinds of routines: stored procedures which you call, or functions whose return values you use in other SQL statements the same way that you use pre-installed MySQL functions like pi() or now(). I’ll use the word “stored procedures” more frequently than “routines” because it’s what we’ve used in the past, and what people expect us to use.
How is a stored procedure composed?
A stored procedure has a name, a parameter list, and an SQL statement, which can contain many more SQL statements. There is new syntax for local variables, error handling, loop control, and IF conditions. Here is an example of a statement that creates a stored procedure. Continue reading
If you want to implement your own google’s like web crawler it’s kind of easy by using their API. To begin we might download the api package here. Once downloaded we have to unpack it using:
tar -xzvf google-1.9.1.tar.gz
#move to the unpacked dir
#install googles apy
sudo python setup.py install
Time Series and PAA
Time series processing is an emerging area in recent years. Despite storage capabilities are not such a problem today, we may deal with large datasets in several areas such as medicine, enterprises, schools, among others.
In order to reduce computing complexity and dimensionality reduction over time series, PAA (Piecewise Linear Approximation) is a very used technique.
Given a time series C of length n, it can be represented in a w-dimensional space by a vector, where the i-th element is given by:
Simply stated, to reduce the time series from n dimensions to w dimensions, the data is divided into w equal sized “frames”. The mean value of the data falling within a frame is calculated and a vector of these values becomes the data-reduced representation. Continue reading
In this tutorial i will try to show you how to connect and interact with a Postgresql database from an small python program, by using the psycopg2 library. We can see how to install it from the official site.
Now, the following step is to create a test database, by using the next commands.
create database test;
create table departments(id int, name varchar(50));
create table users(id int, id_dep int, name varchar(50), income float);
insert into departments values(1,'dep1');
insert into departments values(2,'dep2');
insert into departments values(3,'dep3');
insert into departments values(4,'dep4'); Continue reading
Decision trees (DT) are predictive models aimed to linearly classify an item or a set of items among different classes. The distinctive feature of DT is that only parallel to x and y axis lines may be drawn.
In this post i’m showing you how to train, validate and use a DT to correctly classify objects in R, by using the “cars” set embedded in any R environment.
The cars set gives the speed of 50 cars and the distances taken to stop. Initially this set looks as follows.
1 4 2
2 4 10
3 7 4
4 7 22
5 8 16
6 9 10
In order to classify objects, we’ll assume that about the first half of dataset are “TRUE” objects, while the second half is labeled as “FALSE” objects. Continue reading
Classification is a commonplace problem nowadays. I have been lately working on classification issues for job reasons. I used Matlab, python and some other lower level languages for classification but resulted in tedious and kind of hard approaches.
R incorporates a special library to classify by using neural networks (NN), called ‘neuralnet’, which can be installed directly using: install.packages(‘neuralnet’).
In order to give an example along this post, we need a data set. Iris data set will be enough, it is perhaps the best known database to be found in the pattern recognition literature. We can download it from HERE.
The data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters. Based on the combination of these four features. Continue reading
Have you ever seen an android app with its native font? ughh!, you might agree with me when i say it does not look good. Despite there’s a little tiny set of “custom” or “stylish” fonts we can find in the typefaceproperty of any text containing view (see image below), sometimes it is not enough.
So, what do we need to apply some fancier fonts into our app? Well, the first step it to create a new app, add your views and we are ready to go.
Create a new project, and add a verticalLayout. Inside put as many views as you want (in order to see more fonts examples), Next, we need some .ttf files. We can get plenty of them here. Continue reading