CypherBuilder and CypherUtils Reference Guide

This guide is for version 5.0-rc.7+


2 helper classes, especially meant for the GraphAccess library, but also for anyone programmatically putting together Cypher queries.

SOURCE CODE

Unit tests (with pytest). Unit tests offer many examples of usage



CONTENTS:

Class CypherBuilder

Class CypherUtils


Class CypherBuilder

    Used to automatically assemble various parts of a Cypher query
    to be used to locate a node, or group of nodes, based on various match criteria.
    (No support for scenarios involving links.)
    
    Objects of this class (sometimes referred to as a "match structures")
    are used to facilitate a user to specify a node in a wide variety of ways - and
    save those specifications, to use as needed in later building Cypher queries.

    NO extra database operations are involved.

    IMPORTANT:  By our convention -
                    if internal_id is provided, all other conditions are DISREGARDED;
                    if it's missing, an implicit AND operation applies to all the specified conditions
                    (Regardless, all the passed data is stored in this object)


    Upon instantiation, two broad actions take place:

    First, validation and storage of all the passed specifications (the "RAW match structure"),
    that are used to identify a node or group of nodes.

    Then the generation and storage of values for the following 6 properties:

        1) "node":  A string, defining a node in a Cypher query, incl. parentheses but *excluding* the "MATCH" keyword
        2) "where": A string, defining the "WHERE" part of the subquery (*excluding* the "WHERE"), if applicable;
                    otherwise, a blank
        3) "clause_binding"     A dict meant to provide the data for a clause
        4) "data_binding":      A (possibly empty) data-binding dictionary
        5) "dummy_node_name":   A string used for the node name inside the Cypher query (by default, "n");
                                potentially relevant to the "node" and "where" values
        6) "cypher":            The complete Cypher query, exclusive of RETURN statement and later parts;
                                the WHERE pass will be missing if there are no clauses

        EXAMPLES:
            *   node: "(n)"
                    where: ""
                    clause_binding: {}
                    data_binding: {}
                    dummy_node_name: "n"
                    cypher: "MATCH (n)"
            *   node: "(p :`person` )"
                    where: ""
                    clause_binding: {}
                    data_binding: {}
                    dummy_node_name: "p"
            *   node: "(n  )"
                    where: "id(n) = 123"
                    clause_binding: {}
                    data_binding: {}
                    dummy_node_name: "n"
            *   node: "(n :`car`:`surplus inventory` )"
                    where: ""
                    clause_binding: {}
                    data_binding: {}
                    dummy_node_name: "n"
            *    node: "(n :`person` {`gender`: $n_par_1, `age`: $n_par_2})"
                    where: ""
                    clause_binding: {}
                    data_binding: {"n_par_1": "F", "n_par_2": 22}
                    dummy_node_name: "n"
            *   node: "(n :`person` {`gender`: $n_par_1, `age`: $n_par_2})"
                    where: "n.income > 90000 OR n.state = 'CA'"
                    clause_binding: {}
                    data_binding: {"n_par_1": "F", "n_par_2": 22}
                    dummy_node_name: "n"
            *   node: "(n :`person` {`gender`: $n_par_1, `age`: $n_par_2})"
                    where: "n.income > $min_income"
                    clause_binding:  {"$min_income": 90000}
                    data_binding: {"n_par_1": "F", "n_par_2": 22, "min_income": 90000}
                    dummy_node_name: "n"
    
nameargumentsreturns
__init__internal_id=None, labels=None, key_name=None, key_value=None, properties=None, clause=None, clause_binding=None, dummy_name="n"
        ALL THE ARGUMENTS ARE OPTIONAL (no arguments at all means "match everything in the database")

        :param internal_id: An integer or string with the node's internal database ID.
                                If specified, it will lead to all the remaining arguments being DISREGARDED (though saved)

        :param labels:      A string (or list/tuple of strings) specifying one or more node labels.
                                (Note: blank spaces ARE allowed in the strings)
                                EXAMPLES:  "cars"
                                            ("cars", "powered vehicles")
                            Note that if multiple labels are given, then only nodes possessing ALL of them will be matched;
                            at present, there's no way to request an "OR" operation on labels

        :param key_name:    A string with the name of a node attribute; if provided, key_value must be present, too
        :param key_value:   The required value for the above key; if provided, key_name must be present, too
                                Note: no requirement for the key to be primary

        :param properties:  A (possibly-empty) dictionary of property key/values pairs, indicating a condition to match.
                                EXAMPLE: {"gender": "F", "age": 22}

        :param clause:      Either None, OR a (possibly empty) string containing a Cypher subquery,
                            OR a pair/list (string, dict) containing a Cypher subquery and the data-binding dictionary for it.
                            The Cypher subquery should refer to the node using the assigned dummy_node_name (by default, "n")
                                IMPORTANT:  in the dictionary, don't use keys of the form "n_par_i",
                                            where n is the dummy node name and i is an integer,
                                            or an Exception will be raised - those names are for internal use only
                                EXAMPLES:   "n.age < 25 AND n.income > 100000"
                                            ("n.weight < $max_weight", {"max_weight": 100})

        :param clause_binding:  A dict meant to provide the data for a clause
                                EXAMPLE:  {"max_weight": 100}

        :param dummy_name: A string with a name by which to refer to the nodes (by default, "n") in the clause;
                                only used if a `clause` argument is passed (in the absence of a clause, it's stored as None)
        
nameargumentsreturns
build_cypher_elementsdummy_name=NoneNone
        This method manages the parts of the object buildup that depend on the dummy node name.
        Primary use case:
            if called at the end of a new object's instantiation, it finalizes its construction

        Alternate use case:
            if called on an existing object, it will change its structure
                to make use of the given node dummy name, if possible, or raise an Exception if not.
                (Caution: the object will get permanently changed)

        :param dummy_name:  String with the desired dummy name to use to refer to the node in Cypher queries
        :return:            None
        
nameargumentsreturns
extract_nodestr
        Return the node information to be used in composing Cypher queries

        :return:        A string with the node information, as needed by Cypher queries.  EXAMPLES:
                            "(n  )"
                            "(p :`person` )"
                            "(n :`car`:`surplus inventory` )"
                            "(n :`person` {`gender`: $n_par_1, `age`: $n_par_2})"
        
nameargumentsreturns
extract_dummy_namestr
        Return the dummy node name to be used in composing Cypher queries

        :return:    A string with the dummy node name to use in the Cypher query (often "n" , or "to" , or "from")
        
nameargumentsreturns
unpack_matchtuple
        Return a tuple containing:
        (node, where, data_binding, dummy_node_name) ,
        for use in composing Cypher queries

        :return:    A tuple containing (node, where, data_binding, dummy_node_name)
                        1) "node":  a string, defining a node in a Cypher query,
                                    incl. parentheses but *excluding* the "MATCH" keyword
                        2) "where": a string, defining the "WHERE" part of the subquery (*excluding* the "WHERE"),
                                    if applicable;  otherwise, a blank
                        3) "data_binding":      a (possibly empty) data-binding dictionary
                        4) "dummy_node_name":   a string used for the node name inside the Cypher query (by default, "n");
                                                potentially relevant to the "node" and "where" values
        
nameargumentsreturns
extract_where_clausestr
        Cleanup the WHERE clause, and prefix the "WHERE" keyword as needed

        :return:
        
nameargumentsreturns
assert_valid_structureNone
        Verify that the object is a valid one (i.e., correctly initialized); if not, raise an Exception
        TODO: NOT IN CURRENT USE.  Perhaps to phase out, or keep it but tighten its tests

        :return:        None
        




Class CypherUtils

    Helper STATIC class.
    Meant as a PRIVATE class; not indicated for the end user.
    
nameargumentsreturns
process_match_structurehandle :Union[int, str, CypherBuilder], dummy_node_name=None, caller_method=NoneCypherBuilder
        Accept either a valid internal database node ID, or a "CypherBuilder" object,
        and turn it into a "CypherBuilder" object that makes use of the requested dummy name

        Note: no database operation is performed

        :param handle:          EITHER a valid internal database ID (int or string),
                                    OR a "CypherBuilder" object (containing data to identify a node or set of nodes)

        :param dummy_node_name: [OPTIONAL] A string that will be used inside a Cypher query, to refer to nodes
        :param caller_method:   [OPTIONAL] String with name of caller method, only used for error messages

        :return:                A "CypherBuilder" object, used to identify a node,
                                    or group of nodes
        
nameargumentsreturns
assemble_cypher_blockshandle :Union[int, str, CypherBuilder], dummy_node_name=None, caller_method=Nonetuple
        Put together the various blocks of what can be later assembled into a Cypher query

        :param handle:          EITHER a valid internal database ID (int or string),
                                    OR a "CypherBuilder" object (containing data to identify a node or set of nodes)

        :param dummy_node_name: [OPTIONAL] A string that will be used inside a Cypher query, to refer to nodes
        :param caller_method:   [OPTIONAL] String with name of caller method, only used for error messages

        :return:                A tuple containing (node, where, data_binding, dummy_node_name)
                                    1) "node":  a string, defining a node in a Cypher query,
                                                incl. parentheses but *excluding* the "MATCH" keyword
                                    2) "where": a string, defining the "WHERE" part of the subquery
                                                (*excluding* the "WHERE"),
                                                if applicable;  otherwise, a blank
                                    3) "data_binding":      a (possibly empty) data-binding dictionary
                                    4) "dummy_node_name":   a string used for the node name inside the Cypher query (by default, "n");
                                                            potentially relevant to the "node" and "where" values
        
nameargumentsreturns
check_match_compatibilitymatch1 :CypherBuilder, match2 :CypherBuilderNone
        If the two given "CypherBuilder" objects
        are incompatible - in terms of collision in their dummy node names -
        raise an Exception.

        :param match1:  A "CypherBuilder" object to be used to identify a node, or group of nodes
        :param match2:  A "CypherBuilder" object to be used to identify a node, or group of nodes
        :return:        None
        
nameargumentsreturns
assert_valid_internal_idinternal_id :int|strNone
        Raise an Exception if the argument is not a valid internal graph database ID

        :param internal_id: Alleged internal graph database ID
        :return:            None
        
nameargumentsreturns
valid_internal_idinternal_id :int|strbool
        Return True if `internal_id` is a potentially valid ID for a graph database.
        Note that whether it's actually valid will depend on the specific graph database, which isn't known here.

        EXAMPLES:
            - Neo4j version 4 uses non-negative integers
            - Neo4j version 5 still uses non-negative integers, but also offers an alternative internal ID that is a string
            - Most other graph databases (such as Neptune) use strings

        :param internal_id: An alleged internal database ID
        :return:            True if internal_id is a valid internal database ID, or False otherwise
        
nameargumentsreturns
prepare_labelslabels :Union[str, List[str], Tuple[str]]str
        Turn the given string, or list/tuple of strings - representing one or more database node labels - into a string
        suitable for inclusion into a Cypher query.
        Blanks ARE allowed in the names.
        EXAMPLES:
            "" or None          both give rise to    ""
            "client"            gives rise to   ":`client`"
            "my label"          gives rise to   ":`my label`"
            ["car", "vehicle"]  gives rise to   ":`car`:`vehicle`"

        :param labels:  A string, or list/tuple of strings, representing one or multiple Neo4j labels;
                            it's acceptable to be None
        :return:        A string suitable for inclusion in the node part of a Cypher query
        
nameargumentsreturns
prepare_wherewhere_list: Union[str, list]str
        Given a Cypher WHERE clause, or list/tuple of them, combine them all into one -
        and also prefix the WHERE keyword to the result (if appropriate).
        The *combined* clauses of the WHERE statement are parentheses-enclosed, to protect against code injection

        EXAMPLES:   "" or "      " or [] or ("  ", "") all result in  ""
                    "n.name = 'Julian'" returns "WHERE (n.name = 'Julian')"
                        Likewise for ["n.name = 'Julian'"]
                    ("p.key1 = 123", "   ",  "p.key2 = 456") returns "WHERE (p.key1 = 123 AND p.key2 = 456)"

        :param where_list:  A string with a subclause, or list or tuple of subclauses,
                            suitable for insertion in a WHERE statement

        :return:            A string with the combined WHERE statement,
                            suitable for inclusion into a Cypher query (empty if there were no subclauses)
        
nameargumentsreturns
prepare_data_bindingdata_binding_1 :dict, data_binding_2 :dictdict
        Return the combined version of two data binding dictionaries
        (without altering the original dictionaries)

        :return:    A (possibly empty) dict with the combined data binding dictionaries,
                        suitable for inclusion into a Cypher query
        
nameargumentsreturns
dict_to_cypherdata_dict: {}, prefix="par_"(str, {})
        Turn a Python dictionary (meant for specifying node or relationship attributes)
        into a string suitable for Cypher queries,
        plus its corresponding data-binding dictionary.

        The goal is to make use of Cypher's data-binding capabilities (generally better than
        embedding values into Cypher-query strings: safer against data injection and against broken Cypher!)

        EXAMPLE :
                {'cost': 65.99, 'item description': 'the "red" button'}

                will lead to the pair:
                    (
                        '{`cost`: $par_1, `item description`: $par_2}',
                        {'par_1': 65.99, 'par_2': 'the "red" button'}
                    )

        Note that backticks are used in the Cypher string to allow blanks in the key names.
        Consecutively-named dummy variables ($par_1, $par_2, etc) are used,
        instead of names based on the keys of the data dictionary (such as $cost),
        because the keys might contain blanks.

        SAMPLE USAGE:
            (cypher_properties, data_binding) = dict_to_cypher(data_dict)

        :param data_dict:   A Python dictionary
        :param prefix:      Optional prefix string for the data-binding dummy names (parameter tokens); handy to prevent conflict;
                                by default, "par_"

        :return:            A pair consisting of a string suitable for Cypher queries,
                                and a corresponding data-binding dictionary.
                            If the passed dictionary is empty or None,
                                the pair returned is ("", {})
        
nameargumentsreturns
avoid_links_in_pathavoid_links=None, avoid_label=None, path_dummy_name="p", prefix_and=Falsestr
        Create a clause for a Cypher query to traverse a graph
        while avoiding links with any of the specified names,
        as well as avoiding node that contain the specified label.

        EXAMPLE of usage:
                    MATCH p=(:start_Label)-[*]->(:end_label)
                    WHERE {here insert the clause returned by this function}
                    RETURN ...

        :param avoid_links:     [OPTIONAL] Name, or list/tuple of names, of links to avoid in the graph traversal
        :param avoid_label:     [OPTIONAL] Name of a node label to be avoided on any of the nodes in the graph traversal
        :param path_dummy_name: [OPTIONAL] Whatever dummy name is being used in the overall Cypher query,
                                    to refer to the paths; by default, "p"
        :param prefix_and:      [OPTIONAL] If True, prefix "AND " in cases where the returned value isn't a blank string;
                                    by default, False
        :return:                A Cypher clause fragment