

Keep in mind, that you don't need to be very selective with the features you add. Knowing the subject matter of what you are trying to model is very important. I'd also recommend researching and reading papers about Twitter user trends and the different category of users if you haven't already.

I would highly recommend reading 'Hands On Machine Learning with Scikit-Learn and Tensorflow' if you don't have much experience with machine learning and neural networks for an implementation based approach. Then for each unique ID (twitter user), do feature engineering and brainstorm different features that you think would be useful. I would start by giving each Twitter user a unique ID - this will become the index of your dataset. Then you can find more useful features to feed your model. For example, the number of tweets or pictures they post, the contents of their tweets (you can do some simple sentiment analyses or more complex NLP to extract features from it). Another thing I found from my project is that: you may find some interesting features from the user pages.Even based on the percentage of the (gamers/followers), you can even give weights to these users. Based on the first point: because some gamers are very active in their game community and like to discuss their common interests, you can check the absolute number of the "gamers number" in their followers.If you find the number of their followers is varying, why not just use this "follower number" as a feature.Well here are some possible features you may want to use. I guess your question is: you want to use a neural network to predict whether a user is a "gamer" or not, and you want to find some features from his/her followers. If you would also like details on which programming language would be best for this, I would recommend R or Matlab (my personal choice even though R comes with more premade functions) for developing the algorithm and Fortran or C for a final exportable version (if you wished to do so). The list is infinite! Still, you should be careful not to include too many random (meaningless) features otherwise it might cause underfitting! In case you still want to use ANNs, I would recommend learning the parameters (the weights) for the neural network in a special environment (which you have to manually label) and using features such as the number of people following, ratio: gamers/all, you can try to create a computer vision algorithm for the images he posts (gaming vs non gaming). As an extra, I wholeheartedly encourage you to buy/download John Kruschke's book on Doing Bayesian Data Analysis (A tutorial with R and BUGS), it will give you an excellent insight into data analysis for solving problems like this one.
#SIMPLY FORTRAN TWITTER PDF#
Using this technique, the algorithm would choose how much to weigh each feature using a PDF and you could just feed it as many features as you like without bothering about how relevant they are. BTW, you could also model P(θ = X | labelledUsers = 100) with a beta distribution. I hope this was helpful even though it doesn't use a neural network. To help you find the previously mentioned probability, you can also help yourself with a test set to validate the probability. Afterwards, using some threshold (which should vary in proportion to the number of users which have been labelled P(θ = X | labelledUsers = 100)), you should decide whether a particular user is a gamer or not (I would start off with people that follow a lot of "twitterers" to maximize the chances of following one of your "seed". You could start of with a small seed of highly popular users which you can label as gamers/not gamers. This is a very interesting problem that can be approached in a lot of different ways!Įven though you are asking for a neural network, I will apply my own experience with Bayesian networks as I think they are more than suitable for this task (they might be even the best choice).
