Monday, June 3, 2019
System to Filter Unwanted Messages from OSN User Walls
System to Filter Unwanted Messages from OSN User WallsM.Renuga Devi, G.Seetha lakshmi, M.SarmilaAbstractOne fundamental issue in todays Online Social Networks (OSNs) is to communicate substance abusers the ability to control the messages posted on their own private space to avoid that unwanted issue is displayed. Up to now, OSNs provide little support to this requirement. To reside the gap, in this paper, we propose a system allowing OSN users to pull in a direct control on the messages posted on their walls. This is achieved by call ups of a flexible hulk-based system, that allows users to sew the penetrateing criteria to be employ to their walls, and a Machine acquirement-based soft classifier automatically labeling messages in support of content-based filtering.1. INTRODUCTIONONLINE Social Networks (OSNs) be today one(a) of the most popular interactive medium to communicate, share, and disseminate a considerable amount of human life information. Daily and continuous communications imply the exchange of several types of content, including part with schoolbook,image, audio, and video data. According to Facebookstatistics1 average user creates 90 pieces of content each month, whereas more than 30 billion pieces of content (web links, news, stories, blog posts, notes, photo albums, etc.) are everyplacelap each month. OSNs there is the possibility of posting or commenting other posts on particular public/private areas, called in general walls.Face book allows users to bring up who is allowed to insert messages in their walls (i.e., fellows, friends of friends, or defined groups of friends). The aim of the present work is therefore to propose and experimentally evaluate an automated system, called Filtered Wall (FW), able to filter unwanted messages from OSN user walls. We exploit Machine Learning (ML) school text categorization techniques. The major efforts in building a gamey ill-considered text classifier (STC) are concentrated in the ex traction and selection of a set of characterizing and discriminant features.We base the everyplaceall short text categorization strategy on Radial land Function Networks (RBFN) for their proven capabilities in acting as soft classifiers, in managing noisy data and intrinsically vague classes. We insert the neural ensample indoors a hierarchal two level classification strategy. In the seed level, the RBFN categorizes short messages as Neutral and Non-neutral in the second stage, Non-neutral messages are classified producing sluggish estimates of appropriateness to each of the considered category. The system provides a powerful rule layer exploiting a flexible language to specify Filtering Rules (FRs). In addition, the system provides the support for user-defined vitriolic Lists (BLs), that is, lists of users that are temporarily prevented to post any kind of messages on a user wall.2. RELATED WORKThe main contribution of this paper is the design of a system providing custom izable content-based message filtering for OSNs, based on ML techniques. As we have pointed out in the introduction, to the best of our knowledge, we are the first proposing such kind of screening for OSNs. However, our work has relationships both with the state of the art in content-based filtering, as well as with the field of policy-based personalization for OSNs and, more in general, web contents.2.1 Content-Based FilteringInformation filtering systems are designed to classify a stream of dynamically generated information dispatched asynchronously by an information producer and present to the user those information that are likely to satisfy his/her requirements.In content-based filtering, each user is assumed to operate independently. As a result, a content-based filtering system selects information items based on the correlation between the content of the items and the user preferences as opposed to a collaborative filtering system that carrys items based on the correlation between people with similar preferences. archives processed in content-based filtering are broadly speaking textual in nature and this makes content-based filtering close to text classification. Single label, binary classification, partitioning incoming documents into relevant and non-relevant categories. More complex filtering systems include multi label text categorization automatically labeling messages into partialthematic categories. Content-based filtering is mainly based on the use of the ML paradigm according to which a classifier is automatically induced by learning from a set of pre-classified examples. Several experiments prove that Bag-of-Words (BoW) approaches yield good performance and prevail in general over more sophisticated text mold that may have superior semantics but lower statistical quality. The application of content-based filtering on messages posted on OSN user walls poses additional challenges attached the short length of these messages other than the wide range of topics that can be discussed.3. FILTERED WALL ARCHITECTUREThe architecture in support of OSN services is a 3-tier structure (Fig. 1). The first layer, called Social Network Manager (SNM), comm lone(prenominal) aims to provide the basic OSN unravelalities (i.e., profile and relationship management), whereas the second layer provides the support for external Social Network Applications (SNAs).The back up SNAs may in turn require an additional layer for their neededGraphical User Interfaces (GUIs).The core components of the proposed system are the Content-Based Messages Filtering (CBMF) and the Short Text Classifier modules. The last mentioned component aims to classify messages according to a set of categories. In contrast, the first component exploits the message categorization provided by the STC module to impose the FRs condition by the user.The possible final publication can be summarized as follows1. After entering the private wall of one of his/her contacts, the user tries to post a message, which is intercepted by FW.2. A ML-based text classifier extracts metadata from the content of the message.3. FW uses metadata provided by the classifier, together with data extracted from the social graph and users profiles, to enforce the filtering and BL rules.4. Depending on the result of the previous step, the message forget be published or filtered by FW.4. SHORT TEXT CLASSIFIEREstablished techniques used for text classification work well on data sets with large documents such as newswires corpora but suffer when the documents in the corpus are short. In this context, little aspects are the exposition of a set of characterizing and discriminant features allowing the representation of primal concepts and the collection of a complete and invariable set of supervised examples.We approach the task by defining a hierarchical two-level strategy assuming that it is punter to identify and eliminate neutral sentences, and then classify non-neutra l sentences. The first-level task is conceived as a hard classification in which short texts are labeled with offbeat Neutral and Non-neutral labels. The second-level soft classifier acts on the crisp set of non-neutral short texts.4.1 Text RepresentationThe extraction of an appropriate set of features by which representing the text of a given document is a crucial task strongly affecting the performance of the overall classification strategy. We consider three types of features, BoW, Document properties (Dp) and Contextual Features (CF). Text representation using endogenous knowledge has a good general applicability however, in operational settings, it is legitimate to use in any case exogenous knowledge, i.e., any source of information outside the message body but directly or indirectly related to the message itself. We introduce CF modeling information that characterizes the surround where the user is posting.These features play a key role in deterministically understanding the semantics of the messages. In the BoW representation, terms are identified with words. Dp features are heuristically assessed their definition stems from intuitive considerations, humanity specific criteria and in some cases required trial-and-error procedures.Bad words They are computed similarly to the correct words feature, where the set K is a collection of dirty words for the domain language.Correct words It expresses the amount of terms tk 2 T K, where tk is a term of the considered document dj and K is a set of known words for the domain language.Capital words It expresses the amount of words mostly written with capital letters, calculated as the percentage of words at bottom the message, having more than half of the characters in capital case.Punctuations characters It is calculated as the percentage of the punctuation characters over the total number of characters in the message. For example, the think of of the feature for the document Hello Howre u doing? is 5/24.Excl amation marks It is calculated as the percentage of exclamation marks over the total number of punctuation characters in the message. Referring to the aforementioned document, the value is 3/5.Question marks It is calculated as the percentage of question marks over the total number of punctuations characters in the message. Referring to the aforementioned document, the value is 1/5.4.2 Machine Learning-Based motleyWe address short text categorization as a hierarchical two level classification process. The first-level classifier performs a binary hard categorization that labels messages as Neutral and Non-neutral. The first-level filtering task facilitates the subsequent second-level task in which a finer-grainedClassification is performed. The second-level classifier performs a soft-partition of Non-neutral messages assigning a given message a gradual membership to each of the non-neutral classes. Among the variety of multiclass ML models well suited for text classification, we cho ose the RBFN model for the experimented competitive behavior with respect to other state-of-the-art classifiers.RFBNs have a single hidden layer of processing units with local, restricted activation domain a Gaussian function is commonly used, but any other locally tunable function can be used. RBFN main advantages are that classification function is nonlinear, the model may produce confidence values and it may be robust to outliers drawbacks are the potential sensitivity to input parameters, and potential overtraining sensitivity. The first-level classifier is then merged as a regular RBFN. In the second level of the classification stage, we introduce a modification of the standard use of RBFN.The collection of pre-classified messages presents some critical aspects greatly affecting the performance of the overall classification strategy. To work well, a ML-based classifier needs to be trained with a set of sufficiently complete and consistent pre-classified data. The difficulty of satisfying this constraint is essentially related to the subjective character of the interpretation process with which an expert decides whether to classify a document under a given category.A quantitative evaluation of the agreement among experts is then developed to make transparent the level of inconsistency under which the classification process has construe place.5. FILTERING RULES AND BLACKLIST MANAGEMENTIn this section, we introduce the rule layer adopted for filtering unwanted messages. We start by describing FRs, and then we illustrate the use of BLs. In what follows, we model a social network as a directed graph, where each node corresponds to a network user and edges denote relationships between two disparate users. In particular, each edge is labeled by the type of the established relationship (e.g., friend of, colleague of, parent of) and, possibly, the corresponding trust level, which represents how much a given user considers trustworthy with respect to that specif ic kind of relationship the user with whom he/ she is establishing the relationship.5.1 Filtering RulesIn defining the language for FRs specification, we consider three main issues that, in our opinion, should affect a message filtering decision. First of all, in OSNs like in everyday life, the same message may have different meanings and relevance based on who writes it. As a consequence, FRs should allow users to state constraints on message creators. Given the social networkScenario, creators may also be identified by exploiting information on their social graph.Definition 1 (Creator specification)A creator specification creator Spec implicitly denotes a set of OSN users. It can have one of the following forms, possibly combined.Definition2 (Filtering rule) A filtering rule FR is a tuple (author, creator Spec, content Spec, action), where author is the user who specifies the rule creator Spec is a creator specification, specified according toDefinition 1Content Spec is a Boolea n expression defined on content constraints of the form C ml, where C is a class of the first or second level and ml is the minimum membership level threshold required for class C to make the constraint satisfiedaction 2fblock notifying denotes the action to be performed by the system on the messages matching content Spec and created by users identified by creator Spec. In general, more than a filtering rule can apply to the same user.A message is therefore published only if it is not blocked by any of the filtering rules that apply to the message creator. Note moreover, that it may happen that a user profile does not contain a value for the attribute(s) referred by a FR (e.g., the profile does not specify a value for the attribute Hometown whereas the FR blocks all the messages authored by users coming from a specific city).5.2 Online Setup Assistant for FRs ThresholdsAs mentioned in the previous section, we address the problem of setting thresholds to filter rules, by conceiving a nd implementing within FW, an Online Setup Assistant procedure.5.3 BlacklistsA further component of our system is a BL mechanism to avoid messages from undesired creators, independent from their contents. BLs are directly managed by the system, which should be able to determine who are the users to be inserted in the BL and decide when users retention in the BL is finished. To enhance flexibility, such information are given to the system through a set of rules, hereafter called BL rules. Such rules are not defined by the SNMP therefore, they are not meant as general high-level directives to be applied to the whole community.Similar to FRs, our BL rules make the wall owner able to identify users to be blocked according to their profiles as well as their relationships in the OSN. Therefore, by means of a BL rule, wall owners are, for example, able to ban from their walls users they do not directly know (i.e., with which they have only indirect relationships), or users that are friend of a given person as they may have a bad opinion of this person.6. EVALUATIONIn this section, we illustrate the performance evaluation news report we have carried out the classification and filtering modules. We start by describing the data set.6.1 Problem and Data Set DescriptionThe analysis of related work has highlighted the lack of an publicly available benchmark for comparing different approaches to content-based classification of OSN short texts.6.2 Short Text Classifier Evaluation6.2.1 Evaluation MetricsTwo different types of measures will be used to evaluate the effectiveness of first-level and second-level classifications.In the first level, the short text classification procedure is evaluated on the basis of the contingency table approach. In particular, the derived well-known(a) Overall Accuracy (OA) index capturing the simple percent agreement between truth and classification results, is complemented with theCohens KAPPA (K) coefficient thought to be a more robust meas ure taking into account the agreement occurring by chance .At second level, we adopt measures widely accepted in the Information Retrieval and Document Analysis field, that is, precision (P), that permits to evaluate the number of false positives, Recall (R), that permits to evaluate the number of false negatives, and the overall metric F-Measure(F_), defined as the harmonic mean between the above two indexes.6.2.2 Numerical ResultsBy trial and error, we found a quite good parameter configuration for the RBFN learning model. The best value for the M parameter, that determines the number of Basis Function, is heuristically addressed to N=2, where N is the number of input patterns from the data set.6.2.3 Comparison AnalysisThe lack of benchmarks for OSN short text classification makes problematic the development of a reliable comparative analysis. However, an indirect comparison of our method can be done with work that show similarities or complemental aspects with our solution.6.3 Overall Performance and DiscussionIn order to provide an overall assessment of how effectively the system applies a FR. This table allows us to estimate the Precision and Recall of our FRs, Let us suppose that the system applies a given rule on a certain message. In contrast, Recall has to be interpreted as the probability that, given a rule that must be applied over a certain message, the rule is really enforced.Results achieved by the content-based specification component, on the first-level classification, can be considered good enough and reasonably aligned with those obtained by well-known information filtering techniques.7. DICOMFwDicomFW is a prototype Face book application8 that emulates a personal wall where the user can apply a simple combination of the proposed FRs. Throughout the development of the prototype, we have focused our attention only on the FRs, leaving BL implementation as a future improvement. However, the implemented functionality is critical, since it permi ts the STC and CBMF components to interact.To summarize, our application permits to1. View the list of users FWs2. View messages and post a new one on a FW3. Define FRs using the OSA tool.When a user tries to post a message on a wall, he/ she receive an alerting message if it is blocked by FW.8 CONCLUSIONSIn this paper, we have presented a system to filter undesired messages from OSN walls. The system exploits a ML soft classifier to enforce customizable content-dependent FRs.Fig. 3. DicomFW A message filtered by the walls owner FRsWe plan to study strategies and techniques limiting the inferences that a user can do on the enforced filtering rules with the aim of bypassing the filtering system, such as for instance randomly notifying a message that should instead be blocked, or detecting modifications to profile attributes that have been made for the only purpose of defeating the filtering system.REFERENCES1 A. Adomavicius and G. Tuzhilin, Toward the Next Generation of Recommender S ystems A Survey of the State-of-the-Art and Possible Extensions, IEEE Trans. Knowledge and Data Eng., vol. 17, no. 6, pp. 734-749, June 2005.2 M. Chua and H. Chen, A Machine Learning Approach to Web Page Filtering Using Content and Structure Analysis, Decision Support Systems, vol. 44, no. 2, pp. 482-494, 2008.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.